I find that the tricky part of a good data analysis is knowing the biases in your data, often due to the data collection process, which is not contained in the data itself.
I have seen plenty of overoptimistic results due to improper building of training, validation and test sets, or using bad metrics to evaluate trained models.
It is not clear to me that this project is going to help to overcome those challenges and I am a bit concerned that if this project or similar ones become popular then these problems may become more prevalent.
Another concern is that usually the "customer" asking the question wants a specific result (something significant, some correlation...). If through an LLM connected to this tool my customer finds something that it is wrong but aligned with what he/she wants, as a data scientist/statistician I will have the challenge to make the customer understand that the LLM gave a wrong answer, more work for me.
Maybe with some well-behaved datasets and with proper context this project becomes very useful, we will see :-)
I agree with all of this. I've worked in optical engineering, bioinformatics, and data science writ large for over a decade, knowing the data collection process is foundational to statistical process control and statistical design of experiments. I've watched former employers light cash on fire chasing results from similar methods this MCP runs on the backend due to lack of measurement/experimental context.
A huge red flag to me is that the tool calls here are stateless (every tool call is carried out by a new R process) which means the state has to live in the agent’s context, exactly where you don’t want it for so many reasons. For example, reading a 20MB CSV will immediately end the conversation for any LLM that exists today. And even if it fits, you’re asking the LLM driving this to transcribe the data verbatim to other tools—it has to literally generate the tokens for the data one by one (as opposed to just passing a variable name or expression). This is very slow, very expensive, capped at max output token count, and an opportunity for the LLM to make a mistake.
If the author(s) want to reach out to me, I’m happy to talk about alternative approaches or the extensive native R LLM tooling that exists now. Email in profile.
Realistically even a 100 line csv will get hallucinated on after a few tool calls. The state/context must 100% be offloaded to the MCP server if you expect the LLM to have any reliability about it at all.
Sadly even with a 100% stateful MCP I've noticed that even Claude sometimes just hallucinates.
All the Python-based functionality of this project can now be handled by the mcptools package[1]. That is, mcptools can field MCP requests and dispatch to R code; no need for an intermediate layer of Python. I wonder if the author knows about mcptools? Or did he start coding before it was available?
Without additional setup, GPT-5 already uses python as it deems necessary (e.g. for calculations). Is an R MPC server any different to GPT-5 (that automatically uses python)? Reasoning: they're both an LLM plus a REPL, (I think) this makes them approximately equal? Or is there some advantage to using an MPC Server?
I understand this was probably easier to write in Python, but since it's calling out to R would it have made more sense to write the entire thing in R?
I find that the tricky part of a good data analysis is knowing the biases in your data, often due to the data collection process, which is not contained in the data itself.
I have seen plenty of overoptimistic results due to improper building of training, validation and test sets, or using bad metrics to evaluate trained models.
It is not clear to me that this project is going to help to overcome those challenges and I am a bit concerned that if this project or similar ones become popular then these problems may become more prevalent.
Another concern is that usually the "customer" asking the question wants a specific result (something significant, some correlation...). If through an LLM connected to this tool my customer finds something that it is wrong but aligned with what he/she wants, as a data scientist/statistician I will have the challenge to make the customer understand that the LLM gave a wrong answer, more work for me.
Maybe with some well-behaved datasets and with proper context this project becomes very useful, we will see :-)
I agree with all of this. I've worked in optical engineering, bioinformatics, and data science writ large for over a decade, knowing the data collection process is foundational to statistical process control and statistical design of experiments. I've watched former employers light cash on fire chasing results from similar methods this MCP runs on the backend due to lack of measurement/experimental context.
A huge red flag to me is that the tool calls here are stateless (every tool call is carried out by a new R process) which means the state has to live in the agent’s context, exactly where you don’t want it for so many reasons. For example, reading a 20MB CSV will immediately end the conversation for any LLM that exists today. And even if it fits, you’re asking the LLM driving this to transcribe the data verbatim to other tools—it has to literally generate the tokens for the data one by one (as opposed to just passing a variable name or expression). This is very slow, very expensive, capped at max output token count, and an opportunity for the LLM to make a mistake.
If the author(s) want to reach out to me, I’m happy to talk about alternative approaches or the extensive native R LLM tooling that exists now. Email in profile.
Realistically even a 100 line csv will get hallucinated on after a few tool calls. The state/context must 100% be offloaded to the MCP server if you expect the LLM to have any reliability about it at all.
Sadly even with a 100% stateful MCP I've noticed that even Claude sometimes just hallucinates.
I love R and am always excited about tools for R but I immediately get suspicious when I see things like:
> RMCP has been tested with real-world scenarios achieving 100% success rate:
All the Python-based functionality of this project can now be handled by the mcptools package[1]. That is, mcptools can field MCP requests and dispatch to R code; no need for an intermediate layer of Python. I wonder if the author knows about mcptools? Or did he start coding before it was available?
[1] https://posit-dev.github.io/mcptools/
Without additional setup, GPT-5 already uses python as it deems necessary (e.g. for calculations). Is an R MPC server any different to GPT-5 (that automatically uses python)? Reasoning: they're both an LLM plus a REPL, (I think) this makes them approximately equal? Or is there some advantage to using an MPC Server?
This will kick of a real wave of AI slob hitting journals, won't it? There is already a p-hacking problem, no help needed.
If you run more than one test, you are bound to eventually get a false positive significant result.
I don't know where I'm going with this. I'm using AI a lot myself, always supervised. This hits different.
I understand this was probably easier to write in Python, but since it's calling out to R would it have made more sense to write the entire thing in R?
This MCP agent still doesn't defend the statistically illiterate from themselves.
R² without data visualization is savage.
There’s something unsettling about AI agents being able to perform “machine learning” as per feature list
is there a similar Python package which integrates many / all the (ML & Stats) tools which are included by "R MCP Server"?
rmcp is the name of the offical Rust MCP library.
I hate this so much and also great job