| Title: | Tools to Query the 'Algaebase' Online Database, Standardize Phytoplankton Taxonomic Data, and Perform Functional Group Classifications |
|---|---|
| Description: | Functions that facilitate the use of accepted taxonomic nomenclature, collection of functional trait data, and assignment of functional group classifications to phytoplankton species. Possible classifications include Morpho-functional group (MFG; Salmaso et al. 2015 <doi:10.1111/fwb.12520>) and CSR (Reynolds 1988; Functional morphology and the adaptive strategies of phytoplankton. In C.D. Sandgren (ed). Growth and reproductive strategies of freshwater phytoplankton, 388-433. Cambridge University Press, New York). Versions 2.0.0 and later includes new functions for querying the 'algaebase' online taxonomic database (www.algaebase.org), however these functions require a valid API key that must be acquired from the 'algaebase' administrators. Note that none of the 'algaeClassify' authors are affiliated with 'algaebase' in any way. Taxonomic names can also be checked against a variety of taxonomic databases using the 'Global Names Resolver' service via its API (<https://resolver.globalnames.org/api>). In addition, currently accepted and outdated synonyms, and higher taxonomy, can be extracted for lists of species from the 'ITIS' database via its JSON web service API. The 'algaeClassify' package is a product of the GEISHA (Global Evaluation of the Impacts of Storms on freshwater Habitat and Structure of phytoplankton Assemblages), funded by CESAB (Centre for Synthesis and Analysis of Biodiversity) and the U.S. Geological Survey John Wesley Powell Center for Synthesis and Analysis, with data and other support provided by members of GLEON (Global Lake Ecology Observation Network). DISCLAIMER: This software has been approved for release by the U.S. Geological Survey (USGS). Although the software has been subjected to rigorous review, the USGS reserves the right to update the software as needed pursuant to further analysis and review. No warranty, expressed or implied, is made by the USGS or the U.S. Government as to the functionality of the software and related material nor shall the fact of release constitute any such warranty. Furthermore, the software is released on condition that neither the USGS nor the U.S. Government shall be held liable for any damages resulting from its authorized or unauthorized use. |
| Authors: | Vijay Patil [aut, cre], Torsten Seltmann [aut], Nico Salmaso [aut], Orlane Anneville [aut], Marc Lajeunesse [aut], Dietmar Straile [aut] |
| Maintainer: | Vijay Patil <[email protected]> |
| License: | CC0 |
| Version: | 2.0.6 |
| Built: | 2026-05-11 11:44:53 UTC |
| Source: | https://github.com/cran/algaeClassify |
Split a dataframe column with binomial name into genus and species columns. Plots change in species richness over time, generates species accumulation curve, and compares SAC against simulated idealized curve assuming all unique taxa have equal probability of being sampled at any point in the time series. (author Dietmar Straile)
accum( b_data, phyto_name = "phyto_name", column = NA, n = 100, save.pdf = FALSE, lakename = "", datename = "date_dd_mm_yy", dateformat = "%d-%m-%y" )accum( b_data, phyto_name = "phyto_name", column = NA, n = 100, save.pdf = FALSE, lakename = "", datename = "date_dd_mm_yy", dateformat = "%d-%m-%y" )
b_data |
Name of data.frame object |
phyto_name |
Character string: field containing phytoplankton id (species, genus, etc.) |
column |
column name or number for field containing abundance (biomass,biovol, etc.). Can be NA if the dataset only contains a species list for each sampling date. |
n |
number of simulations for randomized ideal species accumulation curve |
save.pdf |
TRUE/FALSE- should plots be displayed or saved to a pdf? |
lakename |
optional character string for adding lake name to pdf output |
datename |
character string name of b_data field containing date |
dateformat |
character string: posix format for datename column |
a two panel plot with trends in richness on top, and cumulative richness vs. simulated accumulation curve on bottom
data(lakegeneva) #example dataset with 50 rows head(lakegeneva) accum(b_data=lakegeneva,column='biovol_um3_ml',n=10,save.pdf=FALSE)data(lakegeneva) #example dataset with 50 rows head(lakegeneva) accum(b_data=lakegeneva,column='biovol_um3_ml',n=10,save.pdf=FALSE)
Search algaebase for information about a genus of phytoplankton
algaebase_genus_search( genus = NULL, apikey = NULL, handle = NULL, higher = TRUE, print.full.json = FALSE, newest.only = TRUE, long = FALSE, exact.matches.only = TRUE, return.higher.only = FALSE, api_file = NULL )algaebase_genus_search( genus = NULL, apikey = NULL, handle = NULL, higher = TRUE, print.full.json = FALSE, newest.only = TRUE, long = FALSE, exact.matches.only = TRUE, return.higher.only = FALSE, api_file = NULL )
genus |
genus name as character string |
apikey |
valid key for algaebase API as character string |
handle |
curl handle with API key. Will be created if not present. |
higher |
boolean should higher taxonomy be included in output? |
print.full.json |
boolean returns raw json output if TRUE. Default is FALSE (return R data frame) |
newest.only |
boolean should results be limited to the most recent matching entry in algaebase? |
long |
boolean return long output including full species name and authorship, and entry date from algaebase. |
exact.matches.only |
boolean should results be limited to exact matches? |
return.higher.only |
boolean should output only included higher taxonomy? |
api_file |
path to text file containing a valid API key |
data frame that may include: accepted.name (currently accepted synonym if different from input name), input.name (name supplied by user), input.match (1 if exact match, else 0), currently.accepted (1=TRUE/0=FALSE), genus.only (1=genus search/0=genus+species search),higher taxonomy (kingdom,phylum,class,order,family), genus, species (always NA for genus search), infraspecies name (always NA for genus search), long.name (includes author and date if given), taxonomic.status (currently accepted, synonym, or unverified), taxon.rank (taxonomic rank of accepted name (genus, species, infraspecies), mod.date (date when entry was last modified in algaebase).
## Not run: algaebase_genus_search("Anabaena") #not run.## Not run: algaebase_genus_search("Anabaena") #not run.
Helper function for parsing output from algaebase
algaebase_output_parse(x, field.name)algaebase_output_parse(x, field.name)
x |
list object containing output from an algaebase query |
field.name |
character string |
selected output variable as character vector
Search algaebase for information about a list of phytoplankton names
algaebase_search_df( df, apikey = NULL, handle = NULL, genus.only = FALSE, genus.name = "genus", species.name = "species", higher = TRUE, print.full.json = FALSE, long = FALSE, exact.matches.only = TRUE, api_file = NULL, sleep.time = 1 )algaebase_search_df( df, apikey = NULL, handle = NULL, genus.only = FALSE, genus.name = "genus", species.name = "species", higher = TRUE, print.full.json = FALSE, long = FALSE, exact.matches.only = TRUE, api_file = NULL, sleep.time = 1 )
df |
data frame containing columns with genus and species names |
apikey |
valid key for algaebase API as character string |
handle |
curl handle with API key. Will be created if not present. |
genus.only |
boolean: should searches be based solely on the genus name? |
genus.name |
name of data.frame column that contains genus names |
species.name |
name of data.frame column that contains species names |
higher |
boolean should higher taxonomy be included in output? |
print.full.json |
boolean returns raw json output if TRUE. Default is FALSE (return R data frame) |
long |
boolean return long output including full species name and authorship, and entry date from algaebase. |
exact.matches.only |
boolean should results be limited to exact matches? |
api_file |
path to text file containing a valid API key |
sleep.time |
delay between algaebase queries (in seconds). Should be at least 1 second if querying more than 10 names at once. |
data frame that may include: accepted.name (currently accepted synonym if different from input name), input.name (name supplied by user), input.match (1 if exact match, else 0), currently.accepted (1=TRUE/0=FALSE), genus.only (1=genus search/0=genus+species search),higher taxonomy (kingdom,phylum,class,order,family), genus, species (always NA for genus search), infraspecies name (always NA for genus search), long.name (includes author and date if given), taxonomic.status (currently accepted, synonym, or unverified), taxon.rank (taxonomic rank of accepted name (genus, species, infraspecies), mod.date (date when entry was last modified in algaebase).
## Not run: data(lakegeneva) #example dataset with 50 rows new.lakegeneva <- genus_species_extract(lakegeneva,'phyto_name') lakegeneva.algaebase<-algaebase_search_df(new.lakegeneva[1:10,],higher=TRUE,long=TRUE) head(lakegeneva.algaebase) ## End(Not run)## Not run: data(lakegeneva) #example dataset with 50 rows new.lakegeneva <- genus_species_extract(lakegeneva,'phyto_name') lakegeneva.algaebase<-algaebase_search_df(new.lakegeneva[1:10,],higher=TRUE,long=TRUE) head(lakegeneva.algaebase) ## End(Not run)
Retrieve taxonomic information from the algaebase online database (www.algaebase.org) based on a user-specified genus and species name . This function requires a valid API key for algaebase.
algaebase_species_search( genus, species, apikey = NULL, handle = NULL, higher = TRUE, print.full.json = FALSE, newest.only = TRUE, long = FALSE, exact.matches.only = TRUE, api_file = NULL )algaebase_species_search( genus, species, apikey = NULL, handle = NULL, higher = TRUE, print.full.json = FALSE, newest.only = TRUE, long = FALSE, exact.matches.only = TRUE, api_file = NULL )
genus |
genus name as character string |
species |
species name as character string |
apikey |
valid key for algaebase API as character string |
handle |
curl handle with API key. Will be created if not present. |
higher |
boolean should higher taxonomy be included in output? |
print.full.json |
boolean returns raw json output if TRUE. Default is FALSE (return R data frame) |
newest.only |
boolean should results be limited to the most recent matching entry in algaebase? |
long |
boolean return long output including full species name and authorship, and entry date from algaebase. |
exact.matches.only |
boolean should results be limited to exact matches? |
api_file |
path to text file containing a valid API key |
data frame that may include: accepted.name (currently accepted synonym if different from input name), input.name (name supplied by user), input.match (1 if exact match, else 0), currently.accepted (1=TRUE/0=FALSE), genus.only (1=genus search/0=genus+species search),higher taxonomy (kingdom,phylum,class,order,family), genus, species (always NA for genus search), infraspecies name (always NA for genus search), long.name (includes author and date if given), taxonomic.status (currently accepted, synonym, or unverified), taxon.rank (taxonomic rank of accepted name (genus, species, infraspecies), mod.date (date when entry was last modified in algaebase).
## Not run: algaebase_species_search("Anabaena flos-aquae") #not run## Not run: algaebase_species_search("Anabaena flos-aquae") #not run
fuzzy partial matching between a scientific name and a list of possible matches
bestmatch(enteredName, possibleNames, maxErr = 3, trunc = TRUE)bestmatch(enteredName, possibleNames, maxErr = 3, trunc = TRUE)
enteredName |
Character string with name to check |
possibleNames |
Character vector of possible matches |
maxErr |
maximum number of different bits allowed for a partial match |
trunc |
TRUE/FALSE. if true and no match, retry with last three letters truncated |
a character string with the best match, or 'multiplePartialMatches'
possibleMatches=c('Viburnum edule','Viburnum acerifolia') bestmatch(enteredName='Viburnum edulus',possibleNames=possibleMatches)possibleMatches=c('Viburnum edule','Viburnum acerifolia') bestmatch(enteredName='Viburnum edulus',possibleNames=possibleMatches)
Database of functional traits for MFG classification, derived from Rimet et al. 2019
data(mfgTraits)data(mfgTraits)
A data frame with columns:
binomial scientific name
genus name
species name
surface area:volume ratio
maximum linear dimension (micrometers)
product of SAV and MLD; unitless
cell or colony biovolume
biological unit (cell or colony) surface area accounting for mucilage
1/0 indicates colonial growth form
literature-based average colony abundance
Shape descriptions. See Rimet et al. 2019 for abbreviations
CSR classification using traits_to_CSR function and criteria from Reynolds 2006
Transform a phytoplankton timeseries into a matrix of abundances for ordination
date_mat( phyto.df, abundance.var = "biovol_um3_ml", summary.type = "abundance", taxa.name = "phyto_name", date.name = "date_dd_mm_yy", format = "%d-%m-%y", time.agg = c("day", "month", "year", "monthyear"), fun = mean_naomit )date_mat( phyto.df, abundance.var = "biovol_um3_ml", summary.type = "abundance", taxa.name = "phyto_name", date.name = "date_dd_mm_yy", format = "%d-%m-%y", time.agg = c("day", "month", "year", "monthyear"), fun = mean_naomit )
phyto.df |
Name of data.frame object |
abundance.var |
Character string: field containing abundance data. Can be NA if the dataset only contains a species list for each sampling date. |
summary.type |
'abundance' for a matrix of aggregated abundance,'presence.absence' for 1 (present) and 0 (absent). |
taxa.name |
Character string: field containing taxonomic identifiers. |
date.name |
Character string: field containing date. |
format |
Character string: POSIX format string for formatting date column. |
time.agg |
Character string: time interval for aggregating abundance. default is day. |
fun |
function for aggregation. default is mean, excluding NA's. |
A matrix of phytoplankton abundance, with taxa in rows and time in columns. If time.agg = 'monthyear', returns a 3dimensional matrix (taxa,month,year). If abundance.var = NA, matrix cells will be 1 for present, 0 for absent
data(lakegeneva) #example dataset with 50 rows geneva.mat1<-date_mat(lakegeneva,time.agg='month',summary.type='presence.absence') geneva.mat2<-date_mat(lakegeneva,time.agg='month',summary.type='abundance') geneva.mat1 geneva.mat2data(lakegeneva) #example dataset with 50 rows geneva.mat1<-date_mat(lakegeneva,time.agg='month',summary.type='presence.absence') geneva.mat2<-date_mat(lakegeneva,time.agg='month',summary.type='abundance') geneva.mat1 geneva.mat2
Wrapper function for several functions in ritis:: Searches ITIS database for matches to a genus name
genus_search_itis(genus, higher = FALSE)genus_search_itis(genus, higher = FALSE)
genus |
Character string. genus name to search for in ITIS |
higher |
Boolean. If TRUE, add higher taxonomic classifications to output |
input data.frame with matches, current accepted names, synonyms, and higher taxonomy
genus='Anabaena' genus_search_itis(genus,higher=FALSE)genus='Anabaena' genus_search_itis(genus,higher=FALSE)
Split a dataframe column with binomial name into genus and species columns.
genus_species_extract(phyto.df, phyto.name)genus_species_extract(phyto.df, phyto.name)
phyto.df |
Name of data.frame object |
phyto.name |
Character string: field in phyto.df containing species name. |
A data.frame with new character fields 'genus' and 'species'
data(lakegeneva) #example dataset with 50 rows head(lakegeneva) #need to split the phyto_name column new.lakegeneva=genus_species_extract(lakegeneva,'phyto_name') head(new.lakegeneva)data(lakegeneva) #example dataset with 50 rows head(lakegeneva) #need to split the phyto_name column new.lakegeneva=genus_species_extract(lakegeneva,'phyto_name') head(new.lakegeneva)
Get value of algaebase API key from Environment variable Return an error if variable not set.
get_apikey()get_apikey()
api key as character string (invisibly)
Get value of algaebase API key from a file
get_apikey_fromfile(keyfile)get_apikey_fromfile(keyfile)
keyfile |
path to text file |
api key as character string (invisibly)
## Not run: apikey<-get_apikey_fromfile("keyfile.txt")## Not run: apikey<-get_apikey_fromfile("keyfile.txt")
Provides convienent output with a row per name, to streamline merging with original data.
gnr_df( df, name.column, sourceid = NULL, fuzzy_uninomial = TRUE, name_type = "canonical_full", higher = FALSE )gnr_df( df, name.column, sourceid = NULL, fuzzy_uninomial = TRUE, name_type = "canonical_full", higher = FALSE )
df |
data.frame containing names to check |
name.column |
integer or character string with column name containing species names |
sourceid |
integer vector with data source ids. see https://resolver.globalnames.org/sources/ |
fuzzy_uninomial |
boolean. Use fuzzy matching for uninomial names? |
name_type |
Specify format of matched names. Options are 'canonical_simple' (canonical binomial name), 'canonical_full' (with subspecies or subgenera), or 'with_context' (with author and year appended). |
higher |
boolean: Return higher taxonomic classifications? |
new data.frame original names (input_name), information about match type,the best match (match_name), taxonomic status, and other output from gnr_simple(). Will contain a row of NAs if no matches were found for a name.
data(lakegeneva) #example dataset with 50 rows lakegeneva<- genus_species_extract(lakegeneva,'phyto_name') lakegeneva$genus_species <- trimws(paste(lakegeneva$genus, lakegeneva$species)) #checking for matches from all GNRS sources, first 5 rows: lakegeneva.namematches <- gnr_df(lakegeneva,"genus_species") lakegeneva.namematchesdata(lakegeneva) #example dataset with 50 rows lakegeneva<- genus_species_extract(lakegeneva,'phyto_name') lakegeneva$genus_species <- trimws(paste(lakegeneva$genus, lakegeneva$species)) #checking for matches from all GNRS sources, first 5 rows: lakegeneva.namematches <- gnr_df(lakegeneva,"genus_species") lakegeneva.namematches
checks species names against a variety of online databases supports fuzzy partial matching, using the Global Names Resolver (https://resolver.globalnames.org/). Modified on 11/18/2025 by Vijay Patil ([email protected]) for algaeClassify v2.0.5 (pending approval on CRAN).
gnr_simple( name, sourceid = NULL, best_match = TRUE, fuzzy_uninomial = TRUE, name_type = "canonical_full", higher = FALSE )gnr_simple( name, sourceid = NULL, best_match = TRUE, fuzzy_uninomial = TRUE, name_type = "canonical_full", higher = FALSE )
name |
character string binomial scientific name to resolve |
sourceid |
integer vector with data source ids. see https://resolver.globalnames.org/sources/ |
best_match |
boolean. Should the best match be returned based on score? |
fuzzy_uninomial |
boolean. Use fuzzy matching for uninomial names? |
name_type |
Specify format of matched names. Options are 'canonical_simple' (canonical binomial name), 'canonical_full' (with subspecies or subgenera), or 'with_context' (with author and year appended). |
higher |
boolean: Return higher taxonomic classifications? |
new data.frame with name matches, column indicating match type and scores from Global Names Resolver (https://resolver.globalnames.org/). Will contain a row of NAs if no matches found
#Visit https://resolver.globalnames.org/data_sources to see all possible #data sources for name checking. name<-"Aphanazomenon flos-aquae" #sourceid=3 for ITIS database gnr_simple(name,sourceid=3) #search for best match from ITIS gnr_simple(name,sourceid=NULL,best_match=FALSE) #search for all matches from any source#Visit https://resolver.globalnames.org/data_sources to see all possible #data sources for name checking. name<-"Aphanazomenon flos-aquae" #sourceid=3 for ITIS database gnr_simple(name,sourceid=3) #search for best match from ITIS gnr_simple(name,sourceid=NULL,best_match=FALSE) #search for all matches from any source
Wrapper function for applying genus_search_itis and species_search_itis to a whole data.frame containing scientific names
itis_search_df(df, namecol = NA, higher = FALSE, genus.only = FALSE)itis_search_df(df, namecol = NA, higher = FALSE, genus.only = FALSE)
df |
data.frame containing names to check |
namecol |
integer or character string with column name containing species or genus names |
higher |
Boolean. If TRUE, add higher taxonomic classifications to output |
genus.only |
boolean If TRUE, search for matches with just the genus name using genus_search_itis |
data.frame with submitted names (orig.name), matched names (matched.name), 1/0 flag indicating that original name is currently accepted (orig.name.accepted), 1/0 flag indicating if search was genus_only (for distinguishing genus_search_itis and species_search_itis results), synonyms if any, and higher taxonomy (if higher=TRUE)
data(lakegeneva) new.lakegeneva <- genus_species_extract(lakegeneva[1,],'phyto_name') new.lakegeneva$genus_species <- trimws(paste(new.lakegeneva$genus, new.lakegeneva$species)) lakegeneva.genus.itischeck <- itis_search_df(new.lakegeneva,"genus_species") lakegeneva.genus.itischeckdata(lakegeneva) new.lakegeneva <- genus_species_extract(lakegeneva[1,],'phyto_name') new.lakegeneva$genus_species <- trimws(paste(new.lakegeneva$genus, new.lakegeneva$species)) lakegeneva.genus.itischeck <- itis_search_df(new.lakegeneva,"genus_species") lakegeneva.genus.itischeck
example dataset from lake Geneva, Switzerland
data(lakegeneva)data(lakegeneva)
A data frame with columns:
lake name
phytoplankton species name
month of sampling
year of sampling
date of sampling
biovolume
Compute mean value while ignoring NA's
mean_naomit(x)mean_naomit(x)
x |
A numeric vector that may contain NA's |
the mean value
data(lakegeneva) #example dataset with 50 rows mean_naomit(lakegeneva$biovol_um3_ml)data(lakegeneva) #example dataset with 50 rows mean_naomit(lakegeneva$biovol_um3_ml)
Returns a CSR classification based on Morphofunctional group (MFG). Correspondence based on Salmaso et al. 2015 and Reynolds et al. 1988
mfg_csr_convert(mfg)mfg_csr_convert(mfg)
mfg |
Character string with MFG name, following Salmaso et al. 2015 |
A character string with values 'C','S','R','CR','SC','SR', or NA
mfg_csr_convert("11a-NakeChlor")mfg_csr_convert("11a-NakeChlor")
Returns a CSR classification based on Morphofunctional group (MFG). Correspondence based on Salmaso et al. 2015 and Reynolds et al. 1988
mfg_csr_convert_df(phyto.df, mfg)mfg_csr_convert_df(phyto.df, mfg)
phyto.df |
dataframe containing a character field containing MFG classifications |
mfg |
Character string with MFG name, following Salmaso et al. 2015 |
A dataframe with an additional field named CSR, containing CSR classifications or NA
data(lakegeneva) lakegeneva<-genus_species_extract(lakegeneva,'phyto_name') lakegeneva<-species_to_mfg_df(lakegeneva) lakegeneva<-mfg_csr_convert_df(lakegeneva,mfg='MFG') head(lakegeneva)data(lakegeneva) lakegeneva<-genus_species_extract(lakegeneva,'phyto_name') lakegeneva<-species_to_mfg_df(lakegeneva) lakegeneva<-mfg_csr_convert_df(lakegeneva,mfg='MFG') head(lakegeneva)
MFG-CSR correspondence based on CSR-trait relationships in Reynolds et al. 1988 and MFG-trait relationships in Salmaso et al. 2015
data(mfg_csr_library)data(mfg_csr_library)
A data frame with columns:
full MFG name from Salmaso et al. 2015
CSR classification including intermediate classes
Functional Trait Database derived from Rimet et al.
data(mfgTraits)data(mfgTraits)
A data frame with columns:
binomial scientific name
genus name
species name
1/0 indicates presence/absence of flagella or motility
character values 'large' or 'small'; based on 35 micrometer max linear dimension
1/0 indicates typical colonial growth form or not
1/0 indicates filamentous growth form or not
1/0 indicates diatoms with centric growth form
1/0 indicates presence/absence of mucilage
1/0 indicates presence/absence of aerotopes
Taxonomic class
Taxonomic order
MFG classification using traits_to_mfg function
Aggregate phytoplankton timeseries based on abundance. Up to 3 grouping variables can be given: e.g. genus, species, stationid, depth range. If no abundance var is given, will aggregate to presence/absence of grouping vars.
phyto_ts_aggregate( phyto.data, DateVar = "date_dd_mm_yy", SummaryType = c("abundance", "presence.absence"), AbundanceVar = "biovol_um3_ml", GroupingVar1 = "phyto_name", GroupingVar2 = NA, GroupingVar3 = NA, remove.rare = FALSE, fun = sum, format = "%d-%m-%y" )phyto_ts_aggregate( phyto.data, DateVar = "date_dd_mm_yy", SummaryType = c("abundance", "presence.absence"), AbundanceVar = "biovol_um3_ml", GroupingVar1 = "phyto_name", GroupingVar2 = NA, GroupingVar3 = NA, remove.rare = FALSE, fun = sum, format = "%d-%m-%y" )
phyto.data |
data.frame |
DateVar |
character string: field name for date variable. character or POSIX data. |
SummaryType |
'abundance' for a matrix of aggregated abundance,'presence.absence' for 1 (present) and 0 (absent). |
AbundanceVar |
character string with field name containing abundance data Can be NA if data is only a species list and aggregated presence/absence is desired. |
GroupingVar1 |
character string: field name for first grouping variable. defaults to spp. |
GroupingVar2 |
character string: name of additional grouping var field |
GroupingVar3 |
character string: name of additional grouping var field |
remove.rare |
TRUE/FALSE. If TRUE, removes all instances of GroupingVar1 that occur < 5 of time periods. |
fun |
function used to aggregate abundance based on grouping variables |
format |
character string: format for DateVar POSIXct conversion |
a data.frame with grouping vars, date_dd_mm_yy, and abundance or presence/absence
data(lakegeneva) lakegeneva<-genus_species_extract(lakegeneva,'phyto_name') lg.genera=phyto_ts_aggregate(lakegeneva,SummaryType='presence.absence', GroupingVar1='genus') head(lg.genera)data(lakegeneva) lakegeneva<-genus_species_extract(lakegeneva,'phyto_name') lg.genera=phyto_ts_aggregate(lakegeneva,SummaryType='presence.absence', GroupingVar1='genus') head(lg.genera)
Visually assess change in sampling effort over time (author: Dietmar Straile)
sampeff( b_data, column, save.pdf = FALSE, lakename = "", datecolumn = "date_dd_mm_yy", dateformat = "%d-%m-%y" )sampeff( b_data, column, save.pdf = FALSE, lakename = "", datecolumn = "date_dd_mm_yy", dateformat = "%d-%m-%y" )
b_data |
Name of data.frame object |
column |
column name or number for field containing abundance (biomass,biovol, etc.) can be NA for presence absence |
save.pdf |
TRUE/FALSE Should the output plot be saved to a file? defaults to FALSE |
lakename |
Character string for labeling output plot |
datecolumn |
Character String or number specifying dataframe field with date information |
dateformat |
Character string specifying POSIX data format |
a time-series plot of minimum relative abundance over time. This should change systematically with counting effort.
data(lakegeneva) #example dataset with 50 rows sampeff(lakegeneva,column=6) #column 6 contains biovolumedata(lakegeneva) #example dataset with 50 rows sampeff(lakegeneva,column=6) #column 6 contains biovolume
Add algaebase API key to curl handle
set_algaebase_apikey_header(apikey = NULL)set_algaebase_apikey_header(apikey = NULL)
apikey |
character string with valid key |
curl handle object
Trait-based MFG classifications for common Eurasion/North American phytoplankton species. See accompanying manuscript for sources
data(species_mfg_library)data(species_mfg_library)
A data frame with columns:
genus name
species name
corresponding MFG classification based on Salmaso et al. 2015
literature or online source for MFG classification
Algaebase https://www.algaebase.org
Phycokey https://www.cfb.unh.edu/phycokey/phycokey.htm
Western Diatoms of North America https://diatoms.org
CyanoDB 2 http://www.cyanodb.cz/
Nordic Microalgae https://nordicmicroalgae.org
Phytopedia https://phytoplankton.eoas.ubc.ca/
Kapustin, D., Sterlyagova, I. and Patova, E., 2019. Morphology of Chrysastrella paradoxa stomatocysts from the Subpolar Urals (Russia) with comments on related morphotypes. Phytotaxa, 402(6), pp.295-300.
Wrapper function for several functions in ritis:: Searches ITIS database for matches to a binomial scientific name outputs matches, current accepted names, synonyms, and higher taxonomy
species_search_itis(genspp, higher = FALSE)species_search_itis(genspp, higher = FALSE)
genspp |
Character string. Binomial scientific name with space between genus and species. |
higher |
Boolean. If TRUE, add higher taxonomic classifications to output |
data.frame with submitted name (orig.name), matched name (matched.name), 1/0 flag indicating that original name is currently accepted (orig.name.accepted), 1/0 flag indicating if search was genus_only (for distinguishing genus_search_itis and species_search_itis results), synonyms if any, and higher taxonomy (if higher=TRUE)
species="Aphanizomenon flosaquae" species_search_itis(species,higher=FALSE)species="Aphanizomenon flosaquae" species_search_itis(species,higher=FALSE)
Conversion of a single genus and species name to a single MFG. Uses species.mfg.library
species_to_mfg(genus, species = "", flag = 1, mfgDbase = NA)species_to_mfg(genus, species = "", flag = 1, mfgDbase = NA)
genus |
Character string: genus name |
species |
Character string: species name |
flag |
Resolve ambiguous mfg: 1 = return(NA),2= manual selection |
mfgDbase |
data.frame of species MFG classifications. Defaults to the supplied species.mfg.library data object |
a data frame with MFG classification and diagnostic information. ambiguous.mfg=1 if multiple possible mfg matches genus.classification=1 if no exact match was found with genus + species name partial.match=1 if mfg was based on fuzzy matching of taxonomic name.
species_to_mfg('Scenedesmus','bijuga') #returns "11a-NakeChlor"species_to_mfg('Scenedesmus','bijuga') #returns "11a-NakeChlor"
Wrapper function to apply species_phyto_convert() across a data.frame
species_to_mfg_df(phyto.df, flag = 1, mfgDbase = NA)species_to_mfg_df(phyto.df, flag = 1, mfgDbase = NA)
phyto.df |
Name of data.frame. Must have character fields named 'genus' and 'species' |
flag |
Resolve ambiguous MFG: 1 = return(NA), 2 = manual selection |
mfgDbase |
specify library of species to MFG associations. |
input data.frame with a new character column of MFG classifications and diagnostic information
data(lakegeneva) #example dataset with 50 rows new.lakegeneva <- genus_species_extract(lakegeneva,'phyto_name') new.lakegeneva <- species_to_mfg_df(new.lakegeneva) head(new.lakegeneva)data(lakegeneva) #example dataset with 50 rows new.lakegeneva <- genus_species_extract(lakegeneva,'phyto_name') new.lakegeneva <- species_to_mfg_df(new.lakegeneva) head(new.lakegeneva)
surface/volume ratio and max linear dimension criteria for CSR From Reynolds 1988 and Reynolds 2006
data(traitranges)data(traitranges)
A data frame with columns:
measurement type
minimum value for C
minimum value for S
minimum value for R
maximum value for C
maximum value for S
maximum value for R
units of measurement
source for criteria
Assign phytoplankton species to CSR functional groups, based on surface to volume ratio and maximum linear dimension ranges proposed by Reynolds et al. 1988;2006
traits_to_csr( sav, msv, msv.source = "Reynolds 2006", traitrange = algaeClassify::traitranges )traits_to_csr( sav, msv, msv.source = "Reynolds 2006", traitrange = algaeClassify::traitranges )
sav |
numeric estimate of cell or colony surface area /volume ratio |
msv |
numeric product of surface area/volume ratio and maximum linear dimension |
msv.source |
character string with reference source for distinguishing criteria |
traitrange |
data frame with trait criteria for c,s,r groups. The included table can be replaced with user-defined criteria if desired. Measurements are: Surface area/volume ratio (sav), maximum linear dimension (mld) and mld*sav (msv). |
a character string with one of 5 return values: C,CR,S,R, or SR. CR and SR groups reflect overlap between criteria for the 3 main groups.
<https://powellcenter.usgs.gov/geisha> for project information
traits_to_csr(sav=0.2,msv=10,msv.source='Reynolds 2006',traitrange=traitranges)traits_to_csr(sav=0.2,msv=10,msv.source='Reynolds 2006',traitrange=traitranges)
Add CSR functional group classifications to a dataframe of phytoplankton species, based on surface to volume ratio and maximum linear dimension ranges proposed by Reynolds et al. 1988;2006
traits_to_csr_df( df, sav, msv, msv.source = "Reynolds 2006", traitrange = algaeClassify::traitranges )traits_to_csr_df( df, sav, msv, msv.source = "Reynolds 2006", traitrange = algaeClassify::traitranges )
df |
name of dataframe |
sav |
character string with name of column that contains surface to volume ratio values |
msv |
character string with name of column that contains maximum linear dimension * surface to volume ratio values |
msv.source |
character string with reference source for distinguishing criteria |
traitrange |
data frame with trait criteria for c,s,r groups. The included table can be replaced with user-defined criteria if desired. Measurements are: Surface area/volume ratio (sav), maximum linear dimension (mld) and mld*sav (msv). |
a character string with one of 5 return values: C,CR,S,SR, or R
csr.df<-data.frame(msv=10,sav=1) csr.df$CSR<-traits_to_csr_df(csr.df,'msv','sav') print(csr.df)csr.df<-data.frame(msv=10,sav=1) csr.df$CSR<-traits_to_csr_df(csr.df,'msv','sav') print(csr.df)
Assign MFG based on binary functional traits and taxonomy (Class and Order)
traits_to_mfg( flagella = NA, size = NA, colonial = NA, filament = NA, centric = NA, gelatinous = NA, aerotopes = NA, class = NA, order = NA )traits_to_mfg( flagella = NA, size = NA, colonial = NA, filament = NA, centric = NA, gelatinous = NA, aerotopes = NA, class = NA, order = NA )
flagella |
1 if flagella are present, 0 if they are absent. |
size |
Character string: 'large' or 'small'. Classification criteria is left to the user. |
colonial |
1 if typically colonial growth form, 0 if typically unicellular. |
filament |
1 if dominant growth form is filamentous, 0 if not. |
centric |
1 if diatom with centric growth form, 0 if not. NA for non-diatoms. |
gelatinous |
1 mucilagenous sheath is typically present, 0 if not. |
aerotopes |
1 if aerotopes allowing buoyancy regulation are typically present, 0 if not. |
class |
Character string: The taxonomic class of the species |
order |
Character string: The taxonomic order of the species |
A character string of the species' morphofunctional group
traits_to_mfg(flagella = 1,size = "large",colonial = 1,filament = 0,centric = NA,gelatinous = 0, aerotopes = 0,class = "Euglenophyceae",order = "Euglenales")traits_to_mfg(flagella = 1,size = "large",colonial = 1,filament = 0,centric = NA,gelatinous = 0, aerotopes = 0,class = "Euglenophyceae",order = "Euglenales")
Assign morphofunctional groups to a dataframe of functional traits and higher taxonomy
traits_to_mfg_df( dframe, arg.names = c("flagella", "size", "colonial", "filament", "centric", "gelatinous", "aerotopes", "class", "order") )traits_to_mfg_df( dframe, arg.names = c("flagella", "size", "colonial", "filament", "centric", "gelatinous", "aerotopes", "class", "order") )
dframe |
An R dataframe containing functional trait information and higher taxonomy |
arg.names |
Character string of column names corresponding to arguments for traits_to_mfg() |
A character vector containing morpho-functional group (MFG) designations
#create a two-row example dataframe of functional traits func.dframe=data.frame(flagella=1,size=c("large","small"),colonial=0,filament=0,centric=NA, gelatinous=0,aerotopes=0,class="Euglenophyceae",order="Euglenales", stringsAsFactors=FALSE) #check the dataframe print(func.dframe) #run the function to produce a two-element character vector func.dframe$MFG<-traits_to_mfg_df(func.dframe,c("flagella","size","colonial", "filament","centric","gelatinous", "aerotopes","class","order")) print(func.dframe)#create a two-row example dataframe of functional traits func.dframe=data.frame(flagella=1,size=c("large","small"),colonial=0,filament=0,centric=NA, gelatinous=0,aerotopes=0,class="Euglenophyceae",order="Euglenales", stringsAsFactors=FALSE) #check the dataframe print(func.dframe) #run the function to produce a two-element character vector func.dframe$MFG<-traits_to_mfg_df(func.dframe,c("flagella","size","colonial", "filament","centric","gelatinous", "aerotopes","class","order")) print(func.dframe)