PlantNet (PL@ntNet) is the same way - a large amount of data is published under permissive licenses but the main models are all proprietary, with only occasional one-off research projects made public. It's operated by a consortium of French government agencies and non-profits (to the extent that these things are equivalent in France).
I wouldn't mind these groups keeping their models private except that their success sucks all the air out of the room when it comes to developing fully open models. The vast majority of users are satisfied with the app or API and so if you aren't you're going to be going it alone. (Of course a for-profit company could have the same effect, but it feels extra bad when it's a non-profit/government agency doing it.)
Here in Australia, I know a plant is an invasive species when inaturalist/Seek can identify it :) It's not good with native Australian plants so I'd love to build a local solution. Shame it's closed.
I've found it's quite good for Aussie plants, although you do need to have the right photographs for the organism type in question. Generally you want close-ups of flowers, leaves and seedpods, bark and general whole-plant images for flowering plants. If you just provide a whole-plant image it will often fail, this is presumably partly because without details they look too similar, and partly because they reduce the resolution to 2048 pixels maximum before storing it for evaluation. Relatively speaking it currently sucks for grasses, some plants and ferns, many fungi, and almost all insects.
When did they update their model? It used to often fail to identify things, and now it makes wildly wrong guesses all the time. I wish it would just show the top possibilities. I know it’s not a rust rump tarantula so show me the other options nearby.
Quote: iNaturalist makes a subset of its machine learning models publicly available while keeping full species classification models private due to intellectual property considerations and organizational policy.
Shame! IMHO open data input should yield open data output. The community contribute far too much time, data, expertise and money to tolerate this kind of BS, which opens questions about fundamental compatibility with science.
iNaturalist should remove non-open data and commit to fully open output within a fixed period of time to maintain community support.
I don't know. iNaturalist contributes a valuable identification service in their free apps. I'm a relatively experienced naturalist, but greatly helped by the iOS app, simply because remembering hundreds of species names is hard, even when the species are familiar. And the identification abilities of their models outside of my own domain are just stupendous, and freely available.
If they feel like keeping the models to themselves, I think it's a fair game. I give them observations, they gave me the id service for free. Maybe they even sell the models to fund their development efforts? I wouldn't mind... they need to fund their functions somehow anyway.
And remember, their observation databases are open. In fact my observations are automatically copied to the databases of a national biodiversity institution (which is open as well, except for some critical species).
Institutions need to maintain themselves and be able to pay their employees for them being able to feed their kids, etc.
This comment makes no sense as iNaturalist are a non-profit and they run on donations. Data is hosted for free by AWS under the Amazon Open Data Sponsorship Program, which covers infrastructure and bandwidth costs. The community already pays staff salaries. "For 2024, reported revenue was $4.71M, with over 94% coming from contributions and philanthropic sources". See https://www.inaturalist.org/pages/financials ~20 full time employees.
They get almost all of their money from Gordon and Betty Foundation, NSF, Nat Geo Society and the like. The "community" are almost entirely free-riders. I have donated a bit I think (maybe $20 or so), but their statistics say 0.24% of users donate.
sounds like you're arguing that it should be open given that two non-profits and the US government are the primary funders of it. Given how non-profits tend to get a high degree of public subsidy as well (indirect tax subsidies to non-profits and the people that donate to them are essentially paid by the US taxpayer), we can reasonably conclude that the US taxpayer is footing the bill here.
I'm not here to say it _should_ be open. Instead, I'm saying they offer a valuable, international (global) service and I want their economics to be sustainable, and have personally no objections to them keeping their AI models private if they wish so.
Meanwhile, the whole idea of iNaturalist has evolved around voluntary reporting, community involvement, and open data, and I think some of that needs to stay. They can't turn fully commercial.
I use iNaturalist. I do not make monetary donations. I use their tools to make observations and contribute them to the open data set. How am I a free-rider? Without the data contributions from the community, all that money buys you is a bunch of empty servers.
Not sure where to start here. The community include much of the actual scientific community. As such they have dedicated their lives to accruing specialist and leading knowledge often not held by any other individual - now or ever. Accusing them of free riding is simply ignorant.
IMHO funds received by well-run non-profits will be banked, not spent, therefore they yield ongoing returns which are used to meet costs and sustain the organization. The fund origin is immaterial.
Yeah, just wanted to point out in my reply that the community is not paying the salaries their employees, the community in the sense of me and you, those who send observations and ids. The money mostly comes from big donors. To me that sounds a bit like bootstrapping the system, a startup if you wish. Moore giving them $10M per year from here to eternity is not a plausible future.
If there are IP considerations, as they say, it sounds like it might not be entirely open data.
They're a scientific 501(c)(3), not a FOSS 501(c)(3), right? It seems like their missions should be to support scientific progress, sometimes that means using data that is encumbered with IP baggage. It seems like it would be against their mission (and borderline a violation of tax law) to take a stance on IP law... that isn't what they do.
Scientific 501(c)(3) Nonprofits are organized primarily to conduct scientific research in the public interest. Their research must benefit the general public, not specific individuals or commercial enterprises.
This aligns with the suggestion to commit to fully open data and fully open models.
Yeah, "their research" and "not specific individuals or commercial enterprises" both being key. IP baggage from others is inherently not theirs. And it's not like they're cutting dividend checks from sales of private models.
Using scientific data that they can use to do science with but they can't share is 100% legit.
Not sure where this conclusion comes from. What is their research activity? Making data models. How is that in the public interest?
IMHO it's very hard to argue that something is in the public interest if the public can't see it, hold it, analyze it, criticize it, and replicate it: particularly in the field of science where we have a replication crisis.
If it's a black-box service, it's not science.
If it's replicable and open, thus provable, it's science.
I don't know that much about them, but it looks like they have 57 repositories on GitHub. Are they redirecting income to enrich specific people or commercial enterprises? I don't see any anyone claiming that they are.
There is no requirement that a 501(c)(3) post everything publicly.
I completely understand and agree that sharing science is a good thing... but it is also dumb to suggest that scientists must put their head in the sand and ignore data that just happens to be under copyright. And just because it is, doesn't mean that it can't be reviewed -- it means it can't be redistributed.
I mean, for heavens sake, every science textbook I ever read in school was encumbered by copyright. That doesn't mean we should burn science text books or that the data in them is subject to some replication crisis.
I think you're building a mountain out of a molehill here.
A significant amount of international resource is flowing toward this project and the basic guarantees that the output (noting the process for its creation supersedes the significance of any individual artifact) will be available to the community are absent. In their place, we see hand-waving justifications. This is unacceptable. You may consider this a molehill, others do not. Science does not.
I really don't think you do speak for all of science with the suggestion that scientists ignore copyrighted data. It is plainly routine for scientists to work with data that is encumbered by legal obligations beyond their control... because you can still use it to learn things and draw useful conclusions. While there is an opportunity to criticize this politically, it isn't scientists fault for having to follow the law as it stands.
Not currently, but imo they _could_ sell both data products and identification to research institutions. Like having the raw data still free, but charging for derivatives, or for professional service, incl. support, related to that data.
Especially selling identification services, which is related to keeping the models private, would make sense. Museums and various kinds of biodiversity monitoring schemes need mass identification, and having AI there to partially replace people would be a cost saving for the researchers and potential funding for iNaturalist. Offering such a service for free is neither practical nor justified.
(Meanwhile, I can imagine there to be lots of naturalist who hate the idea of their services being partially replaced by AI. It may lower the quality but the cost margin between a human and an iNat model is really wide.)
I think EU had a plan on using AI identification in some of their monitoring schemes. It could have been iNaturalist or someone else, anyway it demonstrates the need.
PlantNet (PL@ntNet) is the same way - a large amount of data is published under permissive licenses but the main models are all proprietary, with only occasional one-off research projects made public. It's operated by a consortium of French government agencies and non-profits (to the extent that these things are equivalent in France).
I wouldn't mind these groups keeping their models private except that their success sucks all the air out of the room when it comes to developing fully open models. The vast majority of users are satisfied with the app or API and so if you aren't you're going to be going it alone. (Of course a for-profit company could have the same effect, but it feels extra bad when it's a non-profit/government agency doing it.)
Here in Australia, I know a plant is an invasive species when inaturalist/Seek can identify it :) It's not good with native Australian plants so I'd love to build a local solution. Shame it's closed.
I've found it's quite good for Aussie plants, although you do need to have the right photographs for the organism type in question. Generally you want close-ups of flowers, leaves and seedpods, bark and general whole-plant images for flowering plants. If you just provide a whole-plant image it will often fail, this is presumably partly because without details they look too similar, and partly because they reduce the resolution to 2048 pixels maximum before storing it for evaluation. Relatively speaking it currently sucks for grasses, some plants and ferns, many fungi, and almost all insects.
When did they update their model? It used to often fail to identify things, and now it makes wildly wrong guesses all the time. I wish it would just show the top possibilities. I know it’s not a rust rump tarantula so show me the other options nearby.
Quote: iNaturalist makes a subset of its machine learning models publicly available while keeping full species classification models private due to intellectual property considerations and organizational policy.
Shame! IMHO open data input should yield open data output. The community contribute far too much time, data, expertise and money to tolerate this kind of BS, which opens questions about fundamental compatibility with science.
iNaturalist should remove non-open data and commit to fully open output within a fixed period of time to maintain community support.
I don't know. iNaturalist contributes a valuable identification service in their free apps. I'm a relatively experienced naturalist, but greatly helped by the iOS app, simply because remembering hundreds of species names is hard, even when the species are familiar. And the identification abilities of their models outside of my own domain are just stupendous, and freely available.
If they feel like keeping the models to themselves, I think it's a fair game. I give them observations, they gave me the id service for free. Maybe they even sell the models to fund their development efforts? I wouldn't mind... they need to fund their functions somehow anyway.
And remember, their observation databases are open. In fact my observations are automatically copied to the databases of a national biodiversity institution (which is open as well, except for some critical species).
Institutions need to maintain themselves and be able to pay their employees for them being able to feed their kids, etc.
This comment makes no sense as iNaturalist are a non-profit and they run on donations. Data is hosted for free by AWS under the Amazon Open Data Sponsorship Program, which covers infrastructure and bandwidth costs. The community already pays staff salaries. "For 2024, reported revenue was $4.71M, with over 94% coming from contributions and philanthropic sources". See https://www.inaturalist.org/pages/financials ~20 full time employees.
They get almost all of their money from Gordon and Betty Foundation, NSF, Nat Geo Society and the like. The "community" are almost entirely free-riders. I have donated a bit I think (maybe $20 or so), but their statistics say 0.24% of users donate.
That's probably not a sustainable situation.
sounds like you're arguing that it should be open given that two non-profits and the US government are the primary funders of it. Given how non-profits tend to get a high degree of public subsidy as well (indirect tax subsidies to non-profits and the people that donate to them are essentially paid by the US taxpayer), we can reasonably conclude that the US taxpayer is footing the bill here.
I'm not here to say it _should_ be open. Instead, I'm saying they offer a valuable, international (global) service and I want their economics to be sustainable, and have personally no objections to them keeping their AI models private if they wish so.
Meanwhile, the whole idea of iNaturalist has evolved around voluntary reporting, community involvement, and open data, and I think some of that needs to stay. They can't turn fully commercial.
I use iNaturalist. I do not make monetary donations. I use their tools to make observations and contribute them to the open data set. How am I a free-rider? Without the data contributions from the community, all that money buys you is a bunch of empty servers.
Not sure where to start here. The community include much of the actual scientific community. As such they have dedicated their lives to accruing specialist and leading knowledge often not held by any other individual - now or ever. Accusing them of free riding is simply ignorant.
IMHO funds received by well-run non-profits will be banked, not spent, therefore they yield ongoing returns which are used to meet costs and sustain the organization. The fund origin is immaterial.
Yeah, just wanted to point out in my reply that the community is not paying the salaries their employees, the community in the sense of me and you, those who send observations and ids. The money mostly comes from big donors. To me that sounds a bit like bootstrapping the system, a startup if you wish. Moore giving them $10M per year from here to eternity is not a plausible future.
If there are IP considerations, as they say, it sounds like it might not be entirely open data.
They're a scientific 501(c)(3), not a FOSS 501(c)(3), right? It seems like their missions should be to support scientific progress, sometimes that means using data that is encumbered with IP baggage. It seems like it would be against their mission (and borderline a violation of tax law) to take a stance on IP law... that isn't what they do.
Scientific 501(c)(3) Nonprofits are organized primarily to conduct scientific research in the public interest. Their research must benefit the general public, not specific individuals or commercial enterprises.
This aligns with the suggestion to commit to fully open data and fully open models.
Yeah, "their research" and "not specific individuals or commercial enterprises" both being key. IP baggage from others is inherently not theirs. And it's not like they're cutting dividend checks from sales of private models.
Using scientific data that they can use to do science with but they can't share is 100% legit.
Not sure where this conclusion comes from. What is their research activity? Making data models. How is that in the public interest?
IMHO it's very hard to argue that something is in the public interest if the public can't see it, hold it, analyze it, criticize it, and replicate it: particularly in the field of science where we have a replication crisis.
If it's a black-box service, it's not science.
If it's replicable and open, thus provable, it's science.
I don't know that much about them, but it looks like they have 57 repositories on GitHub. Are they redirecting income to enrich specific people or commercial enterprises? I don't see any anyone claiming that they are.
There is no requirement that a 501(c)(3) post everything publicly.
I completely understand and agree that sharing science is a good thing... but it is also dumb to suggest that scientists must put their head in the sand and ignore data that just happens to be under copyright. And just because it is, doesn't mean that it can't be reviewed -- it means it can't be redistributed.
I mean, for heavens sake, every science textbook I ever read in school was encumbered by copyright. That doesn't mean we should burn science text books or that the data in them is subject to some replication crisis.
I think you're building a mountain out of a molehill here.
A significant amount of international resource is flowing toward this project and the basic guarantees that the output (noting the process for its creation supersedes the significance of any individual artifact) will be available to the community are absent. In their place, we see hand-waving justifications. This is unacceptable. You may consider this a molehill, others do not. Science does not.
I really don't think you do speak for all of science with the suggestion that scientists ignore copyrighted data. It is plainly routine for scientists to work with data that is encumbered by legal obligations beyond their control... because you can still use it to learn things and draw useful conclusions. While there is an opportunity to criticize this politically, it isn't scientists fault for having to follow the law as it stands.
Is this not so that they could divide openly-collected data and sell this to researchers? Kinda like Libby, but for biology?
Please share any reputable source that suggests they are selling data products.
Not currently, but imo they _could_ sell both data products and identification to research institutions. Like having the raw data still free, but charging for derivatives, or for professional service, incl. support, related to that data.
Especially selling identification services, which is related to keeping the models private, would make sense. Museums and various kinds of biodiversity monitoring schemes need mass identification, and having AI there to partially replace people would be a cost saving for the researchers and potential funding for iNaturalist. Offering such a service for free is neither practical nor justified.
(Meanwhile, I can imagine there to be lots of naturalist who hate the idea of their services being partially replaced by AI. It may lower the quality but the cost margin between a human and an iNat model is really wide.)
I think EU had a plan on using AI identification in some of their monitoring schemes. It could have been iNaturalist or someone else, anyway it demonstrates the need.