Revisions to Which test do I use to estimate the preference of species?

Fixed typo

Source Link

edited 4 hours ago

jginestet

16.3k
2
12
44

This is not an answer, but a series of remarks which would not fit well in comments.

My biggest issue with your analysis is how you dichotomize the abundance information to simply absence/presence. If your “goal is to estimate the preference per species per management intensity”, you are losing a lot of information (e.g. in your example data you have 1 abundance at 12, another at 2, and you are equating both as “1”, i.e. present). I definitely would not do that.
Then there is how you deal with “unobserved”. You say that “The surveys were not strictly standardized: observers walked through the study area and recorded species when encountered”. And then, you assume “that if a species was not recorded within a given sampling unit (Code + management), it was absent”. That is a bold assumption; lack of observation is not the same as “absence”. I would not try to “fabricate” data on absence; you just do not know that. Your surveys should have been more carefully planned to actually record complete absence. After the fact, you will just have to use the data you have (and not “invent” it).\
Moreover, you seem to detect presence/absence only at the Code+Management level, and not at the raster level. Why? And you seem to run your models without using the raster cell? You are losing all the observations at the raster cell level; for a given site (Code) and Management type, you only have 1 observation. Again, lost information.
You run your model on “overall” abundance, as if all species behaved the same. That does not seem to make sense; not all species will perform the same under a given management style.
And yes, if you want to test the abundance of individual species, you will need to run separate regressions. You say that you get “very high standard errors and many p-values are not significant”. That is most likely due to low power (not enough observations); this is not surprising given all the information you lost along the way (aggregating raster cells; dichotomizing abundance).
Your location Code can only take 2 values: I would not use it in the model. Instead I would bake the Code in the raster cell ID (e.g. all cells for location 1 start with A, or 1, similar for location 2). With only 2 locations, there is not enough data to analyze the outcomes per location (and that is not even a goal of yours).
I would run the model on abundance~Speciescode+Raster+Management, and then individually by species. Hopefully you would have enough observations (at least one per raster?) to get decent power. However see question below on Raster vs Management.
And finally a question; is the management style the same per raster? Or can the cell be divided into different management styles? If it is the same, then you can drop the raster factor from the models, and just keep Code.

This is not an answer, but a series of remarks which would not fit well in comments.

My biggest issue with your analysis is how you dichotomize the abundance information to simply absence/presence. If your “goal is to estimate the preference per species per management intensity”, you are losing a lot of information (e.g. in your example data you have 1 abundance at 12, another at 2, and you are equating both as “1”, i.e. present). I definitely would not do that.
Then there is how you deal with “unobserved”. You say that “The surveys were not strictly standardized: observers walked through the study area and recorded species when encountered”. And then, you assume “that if a species was not recorded within a given sampling unit (Code + management), it was absent”. That is a bold assumption; lack of observation is not the same as “absence”. I would not try to “fabricate” data on absence; you just do not know that. Your surveys should have been more carefully planned to actually record complete absence. After the fact, you will just have to use the data you have (and not “invent” it).\
Moreover, you seem to detect presence/absence only at the Code+Management level, and not at the raster level. Why? And you seem to run your models without using the raster cell? You are losing all the observations at the raster cell level; for a given site (Code) and Management type, you only have 1 observation. Again, lost information.
You run your model on “overall” abundance, as if all species behaved the same. That does not seem to make sense; not all species will perform the same under a given management style.
And yes, if you want to test the abundance of individual species, you will need to run separate regressions. You say that you get “very high standard errors and many p-values are not significant”. That is most likely due to low power (not enough observations); this is not surprising given all the information you lost along the way (aggregating raster cells; dichotomizing abundance).
Your location Code can only take 2 values: I would not use it in the model. Instead I would bake the Code in the raster cell ID (e.g. all cells for location 1 start with A, or 1, similar for location 2). With only 2 locations, there is not enough data to analyze the outcomes per location (and that is not even a goal of yours).
I would run the model on abundance~Speciescode+Raster+Management, and then individually by species. Hopefully you would have enough observations (at least one per raster?) to get decent power. However see question below on Raster vs Management.
And finally a question; is the management style the same per raster? Or can the cell be divided into different management styles? If it is the same, then you can drop the raster factor from the models, and just keep Code.

This is not an answer, but a series of remarks which would not fit well in comments.

My biggest issue with your analysis is how you dichotomize the abundance information to simply absence/presence. If your “goal is to estimate the preference per species per management intensity”, you are losing a lot of information (e.g. in your example data you have 1 abundance at 12, another at 2, and you are equating both as “1”, i.e. present). I definitely would not do that.
Then there is how you deal with “unobserved”. You say that “The surveys were not strictly standardized: observers walked through the study area and recorded species when encountered”. And then, you assume “that if a species was not recorded within a given sampling unit (Code + management), it was absent”. That is a bold assumption; lack of observation is not the same as “absence”. I would not try to “fabricate” data on absence; you just do not know that. Your surveys should have been more carefully planned to actually record complete absence. After the fact, you will just have to use the data you have (and not “invent” it).
Moreover, you seem to detect presence/absence only at the Code+Management level, and not at the raster level. Why? And you seem to run your models without using the raster cell? You are losing all the observations at the raster cell level; for a given site (Code) and Management type, you only have 1 observation. Again, lost information.
You run your model on “overall” abundance, as if all species behaved the same. That does not seem to make sense; not all species will perform the same under a given management style.
And yes, if you want to test the abundance of individual species, you will need to run separate regressions. You say that you get “very high standard errors and many p-values are not significant”. That is most likely due to low power (not enough observations); this is not surprising given all the information you lost along the way (aggregating raster cells; dichotomizing abundance).
Your location Code can only take 2 values: I would not use it in the model. Instead I would bake the Code in the raster cell ID (e.g. all cells for location 1 start with A, or 1, similar for location 2). With only 2 locations, there is not enough data to analyze the outcomes per location (and that is not even a goal of yours).
I would run the model on abundance~Speciescode+Raster+Management, and then individually by species. Hopefully you would have enough observations (at least one per raster?) to get decent power. However see question below on Raster vs Management.
And finally a question; is the management style the same per raster? Or can the cell be divided into different management styles? If it is the same, then you can drop the raster factor from the models, and just keep Code.

added 7 characters in body

Source Link

edited 9 hours ago

Nick Cox

63.1k
8
151
240

This is not an answer, but a series of remarks which would not fit well in comments.

My biggest issue with your analysis is how you dichotomize the abundance information to simply absence/presence. If your “goal is to estimate the preference per species per management intensity”, you are losing a lot of information (e.g. in your example data you have 1 abundance at 12, another at 2, and you are equating both as “1”, i.e. present). I definitivelydefinitely would not do that.
Then there is how you deal with “unobserved”. You say that “The surveys were not strictly standardized: observers walked through the study area and recorded species when encountered”. And then, you assume “that if a species was not recorded within a given sampling unit (Code + management), it was absent”. That is a bold assumption; lack of observation is not the same as “absence”. I would not try to “fabricate” data on absence; you just do not know that. Your surveys should have been more carefully planned to actually record complete absence. Post-factoAfter the fact, you will just have to use the data you have (and not “invent” it).\
Moreover, you seem to detect presence/absence only at the Code+Management level, and not at the raster level. Why? And you seem to run your models w/owithout using the raster cell? You are losing all the observations at the raster cell level; for a given site (Code) and Management type, you only have 1 observation. Again, lost information.
You run your model on “overall” abundance, as if all species behaved the same. That does not seem to make sense; not all species will perform the same under a given management style.
And yes, if you want to test the abundance of individual species, you will need to run separate regressions. You say that you get “very high standard errors and many p-values are not significant”. That is most likely due to low power (not enough observations); this is not surprising given all the information you lost along the way (aggregating raster cells; dichotomizing abundance).
Your location Code can only take 2 values: I would not use it in the model. Instead I would bake the Code in the raster cell ID (e.g. all cells for location 1 start with A, or 1, similar for location 2). With only 2 locations, there is not enough data to analyze the outcomes per location (and that is not even a goal of yours).
I would run the model on abundance~Speciescode+Raster+Management, and then individually by species. Hopefully you would have enough observations (at least one per raster?) to get decent power. However see question below on Raster vs Management.
And finally a question; is the management style the same per raster? Or can the cell be divided into different management styles? If it is the same, then you can drop the raster factor from the models, and just keep Code.

This is not an answer, but a series of remarks which would not fit well in comments.

My biggest issue with your analysis is how you dichotomize the abundance information to simply absence/presence. If your “goal is to estimate the preference per species per management intensity”, you are losing a lot of information (e.g. in your example data you have 1 abundance at 12, another at 2, and you are equating both as “1”, i.e. present). I definitively would not do that.
Then there is how you deal with “unobserved”. You say that “The surveys were not strictly standardized: observers walked through the study area and recorded species when encountered”. And then, you assume “that if a species was not recorded within a given sampling unit (Code + management), it was absent”. That is a bold assumption; lack of observation is not the same as “absence”. I would not try to “fabricate” data on absence; you just do not know that. Your surveys should have been more carefully planned to actually record complete absence. Post-facto, you will just have to use the data you have (and not “invent” it).\
Moreover, you seem to detect presence/absence only at the Code+Management level, and not at the raster level. Why? And you seem to run your models w/o using the raster cell? You are losing all the observations at the raster cell level; for a given site (Code) and Management type, you only have 1 observation. Again, lost information
You run your model on “overall” abundance, as if all species behaved the same. That does not seem to make sense; not all species will perform the same under a given management style.
And yes, if you want to test the abundance of individual species, you will need to run separate regressions. You say that you get “very high standard errors and many p-values are not significant”. That is most likely due to low power (not enough observations); this is not surprising given all the information you lost along the way (aggregating raster cells; dichotomizing abundance).
Your location Code can only take 2 values: I would not use it in the model. Instead I would bake the Code in the raster cell ID (e.g. all cells for location 1 start with A, or 1, similar for location 2). With only 2 locations, there is not enough data to analyze the outcomes per location (and that is not even a goal of yours).
I would run the model on abundance~Speciescode+Raster+Management, and then individually by species. Hopefully you would have enough observations (at least one per raster?) to get decent power. However see question below on Raster vs Management.
And finally a question; is the management style the same per raster? Or can the cell be divided into different management styles? If it is the same, then you can drop the raster factor from the models, and just keep Code.

This is not an answer, but a series of remarks which would not fit well in comments.

My biggest issue with your analysis is how you dichotomize the abundance information to simply absence/presence. If your “goal is to estimate the preference per species per management intensity”, you are losing a lot of information (e.g. in your example data you have 1 abundance at 12, another at 2, and you are equating both as “1”, i.e. present). I definitely would not do that.
Then there is how you deal with “unobserved”. You say that “The surveys were not strictly standardized: observers walked through the study area and recorded species when encountered”. And then, you assume “that if a species was not recorded within a given sampling unit (Code + management), it was absent”. That is a bold assumption; lack of observation is not the same as “absence”. I would not try to “fabricate” data on absence; you just do not know that. Your surveys should have been more carefully planned to actually record complete absence. After the fact, you will just have to use the data you have (and not “invent” it).\
Moreover, you seem to detect presence/absence only at the Code+Management level, and not at the raster level. Why? And you seem to run your models without using the raster cell? You are losing all the observations at the raster cell level; for a given site (Code) and Management type, you only have 1 observation. Again, lost information.
You run your model on “overall” abundance, as if all species behaved the same. That does not seem to make sense; not all species will perform the same under a given management style.
And yes, if you want to test the abundance of individual species, you will need to run separate regressions. You say that you get “very high standard errors and many p-values are not significant”. That is most likely due to low power (not enough observations); this is not surprising given all the information you lost along the way (aggregating raster cells; dichotomizing abundance).
Your location Code can only take 2 values: I would not use it in the model. Instead I would bake the Code in the raster cell ID (e.g. all cells for location 1 start with A, or 1, similar for location 2). With only 2 locations, there is not enough data to analyze the outcomes per location (and that is not even a goal of yours).
I would run the model on abundance~Speciescode+Raster+Management, and then individually by species. Hopefully you would have enough observations (at least one per raster?) to get decent power. However see question below on Raster vs Management.
And finally a question; is the management style the same per raster? Or can the cell be divided into different management styles? If it is the same, then you can drop the raster factor from the models, and just keep Code.

Source Link

answered 10 hours ago

jginestet

16.3k
2
12
44

This is not an answer, but a series of remarks which would not fit well in comments.

My biggest issue with your analysis is how you dichotomize the abundance information to simply absence/presence. If your “goal is to estimate the preference per species per management intensity”, you are losing a lot of information (e.g. in your example data you have 1 abundance at 12, another at 2, and you are equating both as “1”, i.e. present). I definitively would not do that.
Then there is how you deal with “unobserved”. You say that “The surveys were not strictly standardized: observers walked through the study area and recorded species when encountered”. And then, you assume “that if a species was not recorded within a given sampling unit (Code + management), it was absent”. That is a bold assumption; lack of observation is not the same as “absence”. I would not try to “fabricate” data on absence; you just do not know that. Your surveys should have been more carefully planned to actually record complete absence. Post-facto, you will just have to use the data you have (and not “invent” it).\
Moreover, you seem to detect presence/absence only at the Code+Management level, and not at the raster level. Why? And you seem to run your models w/o using the raster cell? You are losing all the observations at the raster cell level; for a given site (Code) and Management type, you only have 1 observation. Again, lost information
You run your model on “overall” abundance, as if all species behaved the same. That does not seem to make sense; not all species will perform the same under a given management style.
And yes, if you want to test the abundance of individual species, you will need to run separate regressions. You say that you get “very high standard errors and many p-values are not significant”. That is most likely due to low power (not enough observations); this is not surprising given all the information you lost along the way (aggregating raster cells; dichotomizing abundance).
Your location Code can only take 2 values: I would not use it in the model. Instead I would bake the Code in the raster cell ID (e.g. all cells for location 1 start with A, or 1, similar for location 2). With only 2 locations, there is not enough data to analyze the outcomes per location (and that is not even a goal of yours).
I would run the model on abundance~Speciescode+Raster+Management, and then individually by species. Hopefully you would have enough observations (at least one per raster?) to get decent power. However see question below on Raster vs Management.
And finally a question; is the management style the same per raster? Or can the cell be divided into different management styles? If it is the same, then you can drop the raster factor from the models, and just keep Code.

Stack Exchange Network

Return to Answer

Post Timeline