I'm trying something similar this semester with my course via AGENTS.md. I think this one is overly verbose and probably falls out of context windows pretty quickly, based on my experience (for me, a very terse but clear set of 30 lines performed better than providing examples and more nuanced explanations during my testing with a few models).
I have included the basic "I am a student -- help me learn, don't just do everything for me," but I also am trying out telling it to generate a .history folder with a markdown history of every prompt and a summary of the action take in response.
I _know_ there are some tools that offer the prompt history automatically, but I've told students they can use _whatever_ tool they want, but should let me know if the folder isn't showing up as they work.
The .history folder is required if they used AI and I intend to review it and try to give specific feedback to the students using it as too much of a crutch.
As a general rule with LLMs, don't just tell it to do something if you actually need to make sure it gets done. Use a hook script to make it do that, or use the history that's already there (transcripts of all sessions are retained in ~/.claude, for example). There are innumerable scripts out there to parse these, or your agent will whip one up for you in 5 minutes.
I was hoping to get them access to a specific tool like GitHub Copilot via GitHub Education, but when I looked, sign-ups were paused, so I went the tool-agnostic approach. Even during installation a lot of them were telling me how "chat" told them to fix their installation issues (but some were clearly using an alternative to ChatGPT, specifically).
However, I see from other comments on this post that I may need to include a CLAUDE.md as a copy (and could maybe just leave the .history part out of that version?).
worth clarifying "chat" is actually (linguistically) completely separate from a shorthand for ChatGPT. Livestreamers (e.g. on twitch/youtube) often talk to "chat", the people watching. Visually, they're just narrating actions etc to a 3rd party who is not present.
This has leaked into (some) younger people's vocabulary. A particular example is saying "Chat is this real?"
Ha, I have a 16 year old who yells "chat" often enough when online gaming to be familiar with that use.
I'm positive these students did use an LLM to get the help instead of crowdsourcing, but it is an interesting linguistic overlap.
And while I'm on my "old man" soapbox -- "look it up" and "search for it" somehow became "search it up" with the young people. I corrected my son for years before I started hearing college students also saying it that way...
> "look it up" and "search for it" somehow became "search it up" with the young people
It wouldn't actually be necessary for the phrase "look it up" to exist for this to happen. You're free to apply the particle "up" to pretty much any English verb if you want the semantics that it provides. Compare rustle up, turn up, etc.
You might also want to take note of the episode of Kim Possible where Ron is unsatisfied with the performance of an actor studying to play him, and tells the actor to "Ron it up".
> You're free to apply the particle "up" to pretty much any English verb if you want the semantics that it provides.
I have been speaking English for 20 years but it's my second language. I don't think the semantics of "up" matters when I try to understand phrasal verbs like "turn up". I don't see anything about "up" (as in a direction) in "turn up" or "show up" when it means "to appear" or "to be discovered"... where is the semantic connection?? I think native English speakers just think "up" intrinsically relates to "appear" or "be found" but there's no such connection in other languages I know of.
Similarly with things like "fed up" (as in 'tired of'). Where is the "upness" here?
This is a really interesting thing to think about. English is my mother tongue, and I'd never really considered this, it's always just been part of the language.
If I ask my partner to turn the volume "up," I am asking them to literally move the volume knob "upwards" towards the maximum limit. The physical motion doesn't literally track with televisions and remotes, for example, but you're still moving (turning) the volume upwards towards maximum.
That's how it shakes out in my head? You're moving something upwards towards the maximum. More is bigger, bigger is up.
In Chinese the past is "up" and the future is "down".
Having gotten that into my head, I now get annoyed by the hotkey controls for mpv, which use up arrows and page up to skip into the future and down arrows and page down to skip into the past.
“Fed up” as a phrase comes from feeding livestock up to their fill. It’s very similar to how you would say “filled up”. So the upness comes from raising the level up to the limit.
First, regarding other languages, English gets this usage from its Germanic roots, and you still see that in languages like German, Dutch, Swedish, Danish, and Norwegian.
E.g. in German, words and phrases like "aufessen", "Iss deinen Teller auf", "austrinken", "Trink dein Glas aus". Dutch has "opeten" (eat up) and "opmaken" (use up). Swedish has "äta upp" (eat up) and "dricka upp" (drink up).
So really, English just inherited this and has had it for as long as English has existed.
In fact, the same seems to be true of Germanic languages - the widespread existence of this pattern suggests that it comes from proto-Germanic, the ancestor of Germanic languages spoken around 2000 years ago, or even from earlier Indo-European roots.
As for meaning, it's essentially a metaphorical use of "up" as meaning increasing, completing (fill up), appearing/emerging (come up), improving (touch up) - basically movement towards some completed or improved state which is metaphorically viewed as "up".
> Similarly with things like "fed up" (as in 'tired of'). Where is the "upness" here?
The upness is in having reached a maximum. An interesting comparison is "I've had it up to here!" which makes the metaphorical usage much more explicit.
> I don't think the semantics of "up" matters when I try to understand phrasal verbs like "turn up".
They don't, but they matter a lot when you're coining a new phrasal verb!
(They still matter a little for verbs that already exist. You might see a verb mutate from using one particle to using a different one, but that would probably take hundreds of years.)
Not entirely sure what your point is, but if you're implying that kids don't use "chat" to refer to any LLM (usually ChatGPT) then that's very wrong. It doesn't have anything to do with the usage of "chat" you described.
I don’t think this is right. Certainly one usage of “chat” is the livestream version, as you describe, but both Gen Z and Boomers use “chat” to refer to AI tools specifically
“ I think this one is overly verbose and probably falls out of context windows pretty quickly”
It likely will. Half way through a session I routinely watch the agent append my rules to the top of its thinking only to do exactly what it said it wasn’t going to do after another minute of thinking.
It will then apologize profusely right before doing it again.
I like to think of the initial system prompt as a fuse or bootloader. The user prompt and feedback from tool execution are where most of the alignment comes from in multi-turn use. You want to point the agent in the right direction and then get the hell out of its way.
If your agent isn't performing as expected but can otherwise see and describe the tools as you expect, your mental model of what the tools should be is probably wrong. Adjusting the system prompt can address this, but it quickly bloats and starts to turn into a game of whack-a-mole.
I've got an agent that talks to a very large data warehouse and the system prompt is somewhere around 100 tokens. Most of the important information lives in the user's request and in the environment.
I’ve done something similar for myself to learn Django. Claude Code has a built in Learning Mode, I extended that with a Coaching Mode. Where it is instructed on how to coach me, how to help stub out features, how to give feedback in code review, etc. With the main instruction being never to write code for me when in that mode. It can write basic logic examples/pseudo code and discuss different approaches to the problem. I have found it to be really effective, it is my go to for learning new things. I’m using it now to learn Elixir.
I wish you good luck! And add that I'd be interested to hear how you get on. I intend to adopt a similar approach with my classes in September. The . history folder is a great idea.
I told them up front this was new territory and we're all learning from it. If they're overusing it I assured them they won't be dealing with a student integrity issue, but I will be giving them feedback if they're clearly over-reliant and if they continue to do so, it could eventually impact the grade on later assignments.
I'm hoping they learn to use it as a tool instead of trying to offload all cognition to it.
This is a CS course targeted at non-majors, so thankfully the "fundamentals" aren't as critical as the overall themes and general skills.
Love it! I think the power of LLMs to acquire new skills and deepen the knowdledge is underestimated.
When used correctly, they offer a huge advantage over those who don't use them and think they understand but remain superficial. I encourage you to ask even the most obvious questions.
I always thought it might be good to reframe educational questions in the age of LLMs. Something like "An apple falls on Sir Isaac Newton's head and a photon passes the event horizon of a black hole. Get from A to B." This way, it becomes physically impossible to just query for an answer because the student themselves will have to show the line of reasoning, ideally presenting to the class how they got from A to B.
For those using Claude Code, I recommend Learning mode to instruct Claude to walk you through implementing the solution yourself rather than doing it for you. It’s very helpful when diving into a new domain, and helps build lower level intuition.
To enable it, run /config > output styles > Learning
Learning mode has been a huge help for me, it quickly became my favorite way to learn. I ended up created a “Coaching Mode” output style that took some of the learning concepts like stubbing todos for the users and added other intructions that better fit how I learn
This seems somewhat sensible to me - the genie _is_ out of the bottle, and students absolutely will use AI agents to finish assignments without learning a thing, but there is some value to showing how agents can be used as teaching tools and what healthy use _can_ look like
Same issue as with cliffnotes. Easy way out means the easy way will be taken. Unless, you actually design a decent assignment or exam. In person essays or exams, heavily weighted, you are simply screwed if you didn't study the old fashioned way. A couple of my more serious classes were like this: no homework, no projects, entire grade based on 3 exams. That put the fear of whatever diety you subscribe to into you like nothing else to study hard and not fall behind. One bad exam you can't really come back from. Better luck next year when you retake it. Or, you dig in like hell.
3 tests was already better than the traditional Spanish university class: 1 exam. which is probably written by the department head, not your teacher, and he isn't in any way interested in a high pass rate. Failing 90% of the class might even be positive for them. At that point classes aren't even important: You purchase the tests from the last 10 years, and then you have a prayer of knowing what the bar might be this year.
Teaching, fairness and measuring student performance might seem like similar goals, but it's just so very easy to make sure you succeed at one while messing up the others.
> Teaching, fairness and measuring student performance might seem like similar goals
What? Teaching and measurement are very different goals. The whole point of teaching is to mess with measurements.
This automatically means, by the way, that a huge conflict of interest exists whenever the same party is supposed to be responsible for both instruction and measurement. That's why assessment in a traditional Spanish university is out of the hands of the professor. We should aspire to be more like them.
I tested out of all but the last required Spanish class so I probably skipped over some early stuff and avoided the deeper stuff. But at that level I remember we'd do oral exams with the TA 1 on 1 maybe 15 mins in the hallway. I forget the logistics of it all now. I remember making presentations and class participation in spanish being important. I can't remember how the written exams went.
The insidious thing here is that students can think they're studying and practicing by chatting with an AI "tutor", which shifts them into a passive observation role that's no better than watching YouTube videos.
It turns out that it's much less memorable if you're too "clear and helpful", so nothing helpful sticks for students. A good teacher (tutor, educator, pick a word) challenges students and makes them the right amount of uncomfortable.
These resources often suck for the college major level anyhow. Youtube and such is all dumbed down usually. Or if it isn't dumbed down, you risk studying beyond the scope of the lecture. Every class I took, the professor would say something like "anything in lecture could end up on the exam." And indeed, every exam was comprised of something that came from the slides, and nothing that didn't come from the slides. Even if there was an assigned textbook, there would be so much skipped over, either subtopics or entire chapters. Emphasis can vary by lecturer for the same class as well. The class might fall behind or run ahead of whatever is outlined on the syllabus; that is more an aspirational goal than a solid plan of what to expect.
The best tutor, as always, is your TA or professor, during office hours that you already pay for in tuition. No one takes advantage though, well the students who were getting As already do just to validate their understanding. The students who really ought to go never go.
I'm a college (physics) professor, and last semester specifically had a huge shift in student behavior. In introductory courses, students basically stopped coming to help sessions.
I give a substantial amount of extra credit for attending regular help sessions which yielded about 30% help session conversion in past semesters. This term it dropped below 5%, and those few who came were the ones who were high B/ low A students. The solid A students don't come because they don't need to. The low B and lower students didn't come because they thought they didn't need to? It's unclear, but clearly something changed.
Students performing in the mid-B and up range weren't affected, but below that? The bottom dropped out. Students who should have earned B's earned C's. Students who could have earned C's... didn't.
I used to love classes like that and now that I’m a few decades beyond university, I realize they helped me the most. That do it properly now or everything is going to suck is a good prep for the real world.
They're only cheating themselves in a world that increasingly cares about knowledge (market trend of seniors being preferable hires to fresh out of school juniors) and not the piece of paper that "proved" you had such knowledge.
I agree with you that they are cheating themselves. Unfortunately, a bunch of 18-22 year olds also don't tend to have the maturity to realize that fact. I imagine that the university is trying to nudge them to do the courses in a way that helps themselves because they know otherwise the students won't be wise enough to do that.
These students are making a tradeoff between an abstract notion of "cheating themselves" and a very concrete notion of "having a worse GPA". The second one translates obviously and directly to job prospects.
> The second one translates obviously and directly to job prospects.
I was never asked for my GPA when I first went into the industry. I never put it on my resume and it never even came up. After a decade or so of interviewing candidates, many of whom were recent graduates, I don’t recall any that put their GPA on their resume, and I obviously never cared about it either. Maybe the recruiters did, I dunno.
I’m a data point of one, and I could be wrong, but I don’t think it’s clear and obvious at all that GPA translates to job prospects. At least not in tech.
Not really. If they got admitted to Stanford to begin with, they are smart enough to succeed if they put in the work. So what they are actually trading off against is "I don't want to do the work", which is far less defensible than your reading of the situation.
You are conflating some abstract notion of "succeed"ing (what does that even mean, how can I measure it to two decimal places?) with a very concrete notion of "do this simple thing or get a worse GPA".
Agreed. I don't know how they plan to enforce this but this is way better than some other articles that have come up indicating educational bans on AI use, in-person proctoring, verbal assessments, pen and paper exams etc. This is the first attempt at an approach I've seen that doesn't seek to isolate education from reality; students that are effective at integrating AI into their work and actually understand what they're doing are going to get jobs, which is ultimately the goal of school.
Congrats. This seems like a great prompt to ensure a useful default experience. People should not confuse this with "anti cheating" and instead helping people learn how to learn.
Do you have further insights on AI and education since?
This would be an interesting approach if the course supplied a custom Harness (perhaps in place of a textbook) and this was part of the instruction set inside of it. As a standalone thing you ask students to import into their agent, seems unlikely to work.
To be fair, shipping these guidelines as AGENTS.md/CLAUDE.md in the repo that contains the assignments will make it so that agents will pick this up without needing students to opt in explicitly. Seems like a reasonable first step to me
Hah, I like that these are presented as a CLAUDE.md.
(They have the same content duplicated in an AGENTS.md as well - I really wish Anthropic would hurry up and teach Claude Code to check for that file too.)
They won’t until the winds change, and people start talking about the tradeoffs of Claude Code vs any of the other thousand good quality agent harnesses out there that recognize AGENTS.md
Opencode is good enough for most workflows IME, even if it doesn’t have the kitchen sink of features as cc
i think people out of school underestimate the power of exams. there's a huge difference in classes recently between ones with and without exams. if there is an exam, people are way more likely to study and therefore actually learn
I agree. Learning to code from scratch today would be difficult to say the least. The scar tissue accrued from debugging a compilable typo, a misplaced comma or parenthesis teaches something that's hard to recreate - but replacing it with durable learning that won't age-out is a definite win.
This is interesting. I don't know how the AI agent guidelines will be enforced because there will always be a model outside the curriculum that a student can use to bypass the guidelines. Encouraging academic integrity is useful but requires the student to buy into the idea that they are paying for an education, not a diploma. This is a tough problem and I have been wondering how CS departments are incorporating AI into the curriculum while encouraging appropriate use in a learning environment.
I think the answer to "how will AI agent guidelines be enforced" is that they won't be because they can't be, at least not directly.
This doesn't mean that this approach doesn't have value though. I think it very much does.
One way to indirectly enforce use of the AI agent guidelines is via an oral examination where the instructor and student look over their work together and talk about it. Students who have genuinely tried to learn and used AI as a learning tool via the agent guidelines should do a lot better in an oral exam than students who have used AI as a solution generator.
I adopted the oral exam (without agent guidelines) for a course i teach in the academic year just gone, it worked pretty well. Next term I intend to include the agent guidelines to give them clearer guardrails. Still ultimately optional, but if students choose to ignore them it's gonna be pretty obvious during our conversation.
Stanford has an honour code. Meant no oversight even during exams. Worked surprisingly well when I was there. The flipside is, if you’re ever caught cheating, there are no second chances.
I imagine this applies here, too, if they want to enforce it strictly.
How could you tell? I proctored. People cheat pretty frequently and other students are none the wiser. It really takes like 4 proctors if you want to do it right. Even then I'm sure the clever ones are slipping through. These were scantron though. Short response/essay format you'd be screwed if you didn't know your stuff.
Did you know Theo Baker? He's got a book out about Stanford that says differently.
>Cheating has become omnipresent. I don’t know a single person who hasn’t used A.I. to get through some assignment in college, yet the school was at first slow to realize how widespread this would become. As freshman year went on, some professors suggested that the “nuclear option” might be called for: allowing faculty to proctor in-person exams, a practice banned at the university for over a century to demonstrate “confidence in the honor” of students.
snip
>In junior year, 49 percent of the 849 computer science majors who responded to an annual campus survey said they would rather cheat on an exam than fail.
You mean it worked well for cheaters right? The more I learn about these "honor codes" the more I realize how sheltered these American elites have become.
No, I mean it generally worked well. I can't really say how it worked in undergrad because grad and undergrad school were so separate. I can only talk about grad school. I was really surprised myself how well it worked because I didn't know this coming from my university in Germany, but it really did work, at least in every exam I saw. And I don't consider myself overly naive. I guess it has to do with the fact that people who get admitted to Stanford grad school have already proven a certain work ethic and really want to do the exams to learn more. Your final grade isn't as important when you graduate from Stanford; it only really matters if you want to do a PhD, otherwise it's borderline irrelevant. "I went to Stanford" is all you need for your CV. So I didn't feel a lot of pressure being there of always having to have the best grades, it was more about using your very expensive time there wisely to learn as much as you can, and I felt my peers were the same.
Now, I'm not saying every place can be like that, I'm just trying to explain why at this particular university, the honor code is a reasonable policy that may work perfectly well on policing AI in exams. You can't copy that to other institutions, but it answers how they do it here.
In an ideal world guidelines should be suggestions for those willing to make the best of the course and improve as a person and professional. However a degree has real world value and repercussions, so enabling someone incompetent to do a dangerous job can put innocent lives in jeopardy. It's tough, but I hope in time we learn how to live with this new tech.
I know who Petzold is but I can't remember if I read this at the time. I do love it though. I was thinking a lot about generated code in Visual Studio a couple years after this article.
It's kinda funny to think about various forms of code generation. From compilers to IDEs to parser generators to, now, LLMs. Even several higher level languages that compile to lower level languages are generative, essentially.
Still not a fan of LLMs, but it's always a good to remember that the concept isn't entirely new or unique.
Calling an entity that's forbidden from acting on your behalf an "agent" seems funny but maybe it's meant as a catch-all term. Their use of "assistant" seems better for that purpose.
yeah I don't think that's going to work - it would be kind of like "we're releasing model answers to all assignments but please only use them as a teaching aid and don't copy from them"
best to
a) adapt assignments so that agents are bad at producing solutions
b) have more scenarios where students have to do things in controlled environments. Universities managed to adapt to 'any solution you need is readily available online' so I don't think it will be that different to have several times a month/year where students have to go into a room with nothing but pencil and paper to prove what knowledge they have vs what they have the skills to access
Assignments the agent is bad at seems like a losing battle.
Just need to base the mark off the in person test, maybe keep 20-30% to encourage people to still do the assignments. Some will cheat but it will just be hurting them for the test.
20 years ago this was not unheard of. One exam we had to translate C code to assembly for one of the exercises, convert to numbers to IEEE754 representations and similar, both tasks where access to a laptop would make it possible to cheat. Also had to modify some small computer architecture diagrams if I recall correctly.
For the linear algebra written exam it didn’t work as if you learned to solve the 4 previous years exams, you could be sure most of it was familiar, so you could just prepare for a few standard exercises without really understanding the content.
Our advanced algorithm course used a bit of a combination, with a project take home exam (knapsack like optimization problem - competing for the fastest implementation) combined with a two hour written exam with multiple choice answers, but again only with books, pencil and paper to get to the right answer. This I think could work today, having both the opened ended project + some multiple choice with pencil/paper.
I did most of my CS class tests this way within the last year. It’s not that bad because prof doesn’t care about syntax so much (unless that’s what we’re testing on of course) and details, but wanting instead to make sure we understand broader concepts.
I agree it's not a complete solution. But as those don't exist as a society we are looking for a step function in the right direction. and IMO this is one such step. You may disagree that it's not a very large step, but I would argue it's still in the right direction therefore it is neccesary, especially in education space, and I'm happy to see someone publishing at attempt.
This is a very good baseline for future courses to build on, there would always be a group that wants to jailbreak this and thats okay, but have baseline agent support learning is needed in this ai first world.
I was probably thinking of a future where future devices from education institutions would have these preloaded with a non modifiable version of agents guardrails tuned for learning...
It's a good idea, at the very least it communicates intent to students, but couldn't students just modify CLAUDE.md and not check in their changes to that?
This seems unreasonable to me. One of the best uses of AI is that you can just tell your computer what to do in natural language and it does it. Running bash commands isn't part of the education, its busy work.
in the context of learning I think it's good to execute yourself then see what happens immediately, once you allow the agent to execute things, it tends to run several follow up commands when needed, an example from top of my head would be running a server that fails because the port is already in use, the agent will easily find the port and decide to kill it or not and then re-run the command, but if you run the command and see the error, then you get the chance to learn what's going on and how to fix it. You can still use the LLM to read the error and explain you why this happens, according to this guidelines.
Good practice for university of how to use agents in courses rather than forbidding those without distinction. Official AGENTS.md may be a new pattern in university courses.
I really like this. I'm currently doing a part time BSc and my current module explicitly allows AI usage as long as you 'cite it'. The guidelines are out of date in that they assume you are using a chatbot and not a coding harness. The temptation to have claude write all my pandas code has become too difficult for my self control, but at the same time I actively feel my education is suffering from using it. As I write my final paper I am thankful that I at least despise AI writing too much to use it for the actual marked assessment, but I still feel that I have cheated myself out of part of my education and probably wasted a lot of time going fast in the wrong direction because generating data frames, graphs, statistics, etc. is just so easy with claude
Even though it seems radical, I think the right approach is to simply allow the students to use AI to its full potential, to generate answers, code, whatever.
The onus should be on the instructor to make sure that the student ends up actually understanding and being able to code/solve problems that they pose without using coding agents.
Why? Because:
1. this is exactly what is going on in the real world. People are able to get AI to do whatever the hell they want, but the ones who just use it lazily end up with huge cognitive debts and codebases riddled with opaque bugs that they do not understand whatsoever. If we prevent students from confronting this temptation, then we are sort of coddling or shielding them from it, and not really preparing them to avoid pitfalls of this type.
2. you can actually learn a LOT by being given the answer, if you actually care to learn. i personally think it's pretty fucking lame to handicap a student's ability to learn in an attempt to prevent lazy abuse. isn't the whole point of a grade to measure how well you understand things? can't you have pop quizzes, assignments on a computer with no agent use, written tests, etc etc. to catch the lazy abusers? this is an unnecessary prevention of lazy abuse that unfairly handicaps learning
> you can actually learn a LOT by being given the answer, if you actually care to learn.
Even if you "actually care to learn", this is a huge mental shortcut and you're deceiving yourself if you think deep learning is happening from looking at the answer.
On top of that, the pressures to just finish the coursework and move on to your other homework due tomorrow seems pretty high. Your suggestion means we're no longer coddling/shielding students, but we also aren't actively helping them, are we?
Not from simply looking at the answer. From knowing the answer and reverse-engineering or understanding how to arrive at that answer in the first place. It's not always the best way of learning, but it definitely is a great way to learn if you care to actually understand why it is the answer and how you would have arrived at it.
> Your suggestion means we're no longer coddling/shielding students, but we also aren't actively helping them, are we?
My suggestion is just the former, it doesn't imply the latter.
My understanding is that research shows more learning happens when the student has to struggle with the material to solve problems and answer questions.
Stanford is a research university. The student should have full responsibility for learning outcomes. The university will provide support and opportunities to the extent its resources allow, but it's up to the student to choose if they want to take advantage of that. Those who need a more guided approach to learning can always go to teaching-oriented universities or find a personal tutor.
That's a major reason why employers have traditionally valued degrees from research universities, even if they are not particularly highly ranked. Being able to thrive in an environment like that shows a degree of independence and initiative.
As an educator you have to trust that most students are there to learn, not to cheat. Some will cheat, of course, and you can try to stop it or detect it but it devolves into whack-a-mole and the cheater is probably more motivated than the instructor.
Once they have graduated they will be on the job using LLMs and agents all day long, and their employers not only won't care they will be encouraging or requiring it.
When calorie dense food and gas powered vehicles came on the scene, humans (generally) got fat and out of shape. "Why eat that salad and go for a run?" one might say, "This cheesecake tastes much better and I can just drive wherever I want to go."
Getting fat is one thing, but getting stupid is another, and I really fear for the future of humanity when it becomes so easy to sidestep the processes that let us actually learn and grow because stuff like "using agent ai coding is trivial".
using a coding tool is trivial, correct. so is using a microwave oven or its larger counter-parts. you need a certain level of person to know if what came out of it was Michelin-star or not and I do not think Stanford is going for Hot Pockets here.
Is this all an elite educational institution with about $50bil in assets could muster, lol? This is completely and utterly unenforceable, and such, worthless.
There really needs to be diversity in delivery styles for different modules of courses according to their aims, with 'ai access' as a key variable.
If AI is allowed, it should be based on $x of usage/student, with an audit trail to prove no external funding was used, and module aims based on using AI to the max while conserving token use. Like actually creating wild, ambitious shit which takes cutting edge services to the max.
If AI is not allowed for a module, then it really needs to go back to the old skool, with handwritten exams, or coding using old machines and textbooks. Some skills, techniques, etc, really do need drilling.
Straddling the middle will help nobody, result in accusations, increase the burden on teaching staff, and result in a course without a realistic focus.
Though I guess if you're a big brand university, you don't really need to care about innovating. The money will keep pouring in. The whole further education sector is in dire need of a shake up.
I don’t really know why this is getting downvoted. It’s clear that higher education is degrading because of easy to reach AI solutions that have no type of penalties for use.
During my undergrad it was normal to see people refer to Chegg solutions to get their answers, or as a friend for theirs.
Maybe there’s a reason my first CS professor wrote out Java code with pencil and paper I guess.
Students are struggling to get work after graduating because they're dropped into a competitive environment. Ideals aren't enough to get jobs in the current environment.
Universities should be places which are at the bleeding edge of development and providing society with the best new ideas/tech, etc has to offer. Junior workers should be hotbeds of exciting talent which have the ability to revolutionise industries.
By creating such milquetoast environments to study in, which are seemingly scared or unable to prepare people for the future, students are being done a disservice.
Far too many people are far too comfortable with their cushty positions, and it's not doing the youth any favours.
Im confused, are you suggesting students using AI to do their assignments for them and have them learn nothing will benefit them more or less in the future when they entire a competitive environment?
1. Everyone already employed is "cheating" and not using fundamentals. Therefore to prepare them for the workforce them must just learn to "cheat" effectively... at the expense of the "ideals" (read: direct skills or knowledge.)
2. "Milquetoast environments" -- A general "tough love" trope, but I'm unclear on how this tough-school will somehow match the unique issues of the tough-work. Mix incompatible types of difficulty and people are just worse-off.
For that matter, why not flip the argument around? If the future competition everyone slinging stuff through LLM slop all day, perhaps ensuring students have fundamental skills to differentiate themselves becomes more important, rather than less.
Funny you should say that. This is about Stanford:
>In our tech-enabled, newly A.I.-powered world, students were increasingly fudging just about everything. They would embezzle dorm funds to spend on their friends and lie about having Covid to get the UberEats credits that the school offered to those in quarantine. Some kids I knew published a paper that claimed a groundbreaking new A.I. advancement. Online sleuths quickly pointed out that it appeared to be just a stolen Chinese model, to which the two Stanford co-authors responded by blaming the plagiarism on the third author.
>In junior year, 49 percent of the 849 computer science majors who responded to an annual campus survey said they would rather cheat on an exam than fail. A friend of mine captured the school’s ethos while we were discussing the tech hardware and other items our student club neglected to return to corporate sponsors. It was all, I recall her saying, “just a little bit of fraud.”
This is ridiculous. The genie is not going to go back into the bottle. This is the equivalent of "you wouldn't download a car". (Yes, we would.)
The solution is to scale the difficulty of the objective measures. Expect far more from students.
Reorient the university around physical laboratories and timesharing resources no single student could afford. It's already like this in many STEM disciplines.
More internships, more networking, more large projects. Less trivial tests of knowledge and credentialism.
I mean you kinda have to own a computer to be in this industry at all. Back in the 90s there was a clear distinction between someone who had a computer and someone who didn't in so far as what they could do. Maybe a subscription might be a requirement soon. Overall it's much less money than what my parents had to spend on my computer in the 90s. In inflation adjusted dollars they spent over 5k on just my tech before I was even 13 years old.
Stanford is industry compliant and teaches the youth to outsource their thinking to BigTech. No surprise here, the donors will be happy.
The "but we do not let them write code directly" is a smoke screen to appease critics and parents. Yes, hello parents, you pay for your offspring to become a mindless industry tool.
Seeing my own kids (teens) go through some of this, I'm becoming slightly less pessimistic as it all shakes out. Among their peer groups there does seem to be an opinion forming that sure, anyone can just ask ChatGPT for quick answers on assignments, but actually knowing stuff is a bit of a "flex" that's respected.
300 years ago when I was in high school I had a friend choose to go the HVAC trade school route instead of college. He chose the hardest school in the country where they did most things manually so that students understood how things work. It removed the "magic" some tools provide. I was pretty impressed he was wise enough to do that. He's exceptional at his job by the way.
I think we have a tendency to think the worst of your people. They frequently surprise me though.
I don't think you will achieve much that way. Suppose you pair me with a plumber, an area I know almost nothing about. The plumber is not able to finish anything 'trickier' because they have an ignoramus looking over their shoulder. Maybe I can learn some things by watching, having things explained to me as they work. On the other hand if I just walk away there's no way to tell from the final product. You gotta learn the fundamentals to at least a comparable level to be able to contribute. Same reason you couldn't just google everything on your phone in calculus.
What a ridiculous take. Just because solutions exist in the llm training data, doesn't make these problems 'toy' or 'easy'. The human 'engineering hardness' scale doesn't align with what an llm can and can't do.
As an employer, I want education to be robust from the ground up, not turn uni into an attempt to bootcamp whatever is hot today.
I don't think a 4 year postsecondary education is enough time to make a developer that can hit the ground running. Not if it's 100% of class time on CS theory. Nor if it were 4 years of vocational training and labwork that leaned heavy into AI. Nor some mix. We train on the job heavily, it's just not possible to fit everything into the sausage grinder.
So why not throw in some mandatory non-major electives? Take the time to do stuff that frustrates people who want uni to be a certificate mill. I don't care if green employees are experts at the exact narrow set of tools I use. I want them to be good at learning, and to have gotten most of the standard CS topics out of the way.
I'm trying something similar this semester with my course via AGENTS.md. I think this one is overly verbose and probably falls out of context windows pretty quickly, based on my experience (for me, a very terse but clear set of 30 lines performed better than providing examples and more nuanced explanations during my testing with a few models).
I have included the basic "I am a student -- help me learn, don't just do everything for me," but I also am trying out telling it to generate a .history folder with a markdown history of every prompt and a summary of the action take in response.
I _know_ there are some tools that offer the prompt history automatically, but I've told students they can use _whatever_ tool they want, but should let me know if the folder isn't showing up as they work.
The .history folder is required if they used AI and I intend to review it and try to give specific feedback to the students using it as too much of a crutch.
I just started this last Friday, so wish me luck!
As a general rule with LLMs, don't just tell it to do something if you actually need to make sure it gets done. Use a hook script to make it do that, or use the history that's already there (transcripts of all sessions are retained in ~/.claude, for example). There are innumerable scripts out there to parse these, or your agent will whip one up for you in 5 minutes.
I was hoping to get them access to a specific tool like GitHub Copilot via GitHub Education, but when I looked, sign-ups were paused, so I went the tool-agnostic approach. Even during installation a lot of them were telling me how "chat" told them to fix their installation issues (but some were clearly using an alternative to ChatGPT, specifically).
However, I see from other comments on this post that I may need to include a CLAUDE.md as a copy (and could maybe just leave the .history part out of that version?).
worth clarifying "chat" is actually (linguistically) completely separate from a shorthand for ChatGPT. Livestreamers (e.g. on twitch/youtube) often talk to "chat", the people watching. Visually, they're just narrating actions etc to a 3rd party who is not present.
This has leaked into (some) younger people's vocabulary. A particular example is saying "Chat is this real?"
https://knowyourmeme.com/memes/chat-is-this-real
but some people use it more freely.
Ha, I have a 16 year old who yells "chat" often enough when online gaming to be familiar with that use.
I'm positive these students did use an LLM to get the help instead of crowdsourcing, but it is an interesting linguistic overlap.
And while I'm on my "old man" soapbox -- "look it up" and "search for it" somehow became "search it up" with the young people. I corrected my son for years before I started hearing college students also saying it that way...
> "look it up" and "search for it" somehow became "search it up" with the young people
It wouldn't actually be necessary for the phrase "look it up" to exist for this to happen. You're free to apply the particle "up" to pretty much any English verb if you want the semantics that it provides. Compare rustle up, turn up, etc.
You might also want to take note of the episode of Kim Possible where Ron is unsatisfied with the performance of an actor studying to play him, and tells the actor to "Ron it up".
> You're free to apply the particle "up" to pretty much any English verb if you want the semantics that it provides.
I have been speaking English for 20 years but it's my second language. I don't think the semantics of "up" matters when I try to understand phrasal verbs like "turn up". I don't see anything about "up" (as in a direction) in "turn up" or "show up" when it means "to appear" or "to be discovered"... where is the semantic connection?? I think native English speakers just think "up" intrinsically relates to "appear" or "be found" but there's no such connection in other languages I know of.
Similarly with things like "fed up" (as in 'tired of'). Where is the "upness" here?
This is a really interesting thing to think about. English is my mother tongue, and I'd never really considered this, it's always just been part of the language.
If I ask my partner to turn the volume "up," I am asking them to literally move the volume knob "upwards" towards the maximum limit. The physical motion doesn't literally track with televisions and remotes, for example, but you're still moving (turning) the volume upwards towards maximum.
That's how it shakes out in my head? You're moving something upwards towards the maximum. More is bigger, bigger is up.
> More is bigger, bigger is up.
In Chinese the past is "up" and the future is "down".
Having gotten that into my head, I now get annoyed by the hotkey controls for mpv, which use up arrows and page up to skip into the future and down arrows and page down to skip into the past.
“Fed up” as a phrase comes from feeding livestock up to their fill. It’s very similar to how you would say “filled up”. So the upness comes from raising the level up to the limit.
It can also mean damage (blow up, tear up) or completeness (clean up, drink up).
Let me clear that up for you!
First, regarding other languages, English gets this usage from its Germanic roots, and you still see that in languages like German, Dutch, Swedish, Danish, and Norwegian.
E.g. in German, words and phrases like "aufessen", "Iss deinen Teller auf", "austrinken", "Trink dein Glas aus". Dutch has "opeten" (eat up) and "opmaken" (use up). Swedish has "äta upp" (eat up) and "dricka upp" (drink up).
So really, English just inherited this and has had it for as long as English has existed.
In fact, the same seems to be true of Germanic languages - the widespread existence of this pattern suggests that it comes from proto-Germanic, the ancestor of Germanic languages spoken around 2000 years ago, or even from earlier Indo-European roots.
As for meaning, it's essentially a metaphorical use of "up" as meaning increasing, completing (fill up), appearing/emerging (come up), improving (touch up) - basically movement towards some completed or improved state which is metaphorically viewed as "up".
> Similarly with things like "fed up" (as in 'tired of'). Where is the "upness" here?
The upness is in having reached a maximum. An interesting comparison is "I've had it up to here!" which makes the metaphorical usage much more explicit.
> I don't think the semantics of "up" matters when I try to understand phrasal verbs like "turn up".
They don't, but they matter a lot when you're coining a new phrasal verb!
(They still matter a little for verbs that already exist. You might see a verb mutate from using one particle to using a different one, but that would probably take hundreds of years.)
Not entirely sure what your point is, but if you're implying that kids don't use "chat" to refer to any LLM (usually ChatGPT) then that's very wrong. It doesn't have anything to do with the usage of "chat" you described.
I don’t think this is right. Certainly one usage of “chat” is the livestream version, as you describe, but both Gen Z and Boomers use “chat” to refer to AI tools specifically
“ I think this one is overly verbose and probably falls out of context windows pretty quickly”
It likely will. Half way through a session I routinely watch the agent append my rules to the top of its thinking only to do exactly what it said it wasn’t going to do after another minute of thinking.
It will then apologize profusely right before doing it again.
As others have said, use hooks.
I like to think of the initial system prompt as a fuse or bootloader. The user prompt and feedback from tool execution are where most of the alignment comes from in multi-turn use. You want to point the agent in the right direction and then get the hell out of its way.
If your agent isn't performing as expected but can otherwise see and describe the tools as you expect, your mental model of what the tools should be is probably wrong. Adjusting the system prompt can address this, but it quickly bloats and starts to turn into a game of whack-a-mole.
I've got an agent that talks to a very large data warehouse and the system prompt is somewhere around 100 tokens. Most of the important information lives in the user's request and in the environment.
I’ve done something similar for myself to learn Django. Claude Code has a built in Learning Mode, I extended that with a Coaching Mode. Where it is instructed on how to coach me, how to help stub out features, how to give feedback in code review, etc. With the main instruction being never to write code for me when in that mode. It can write basic logic examples/pseudo code and discuss different approaches to the problem. I have found it to be really effective, it is my go to for learning new things. I’m using it now to learn Elixir.
This is not long at all compared to what is actually being used in production by my agents and others like Claude Code...
I wish you good luck! And add that I'd be interested to hear how you get on. I intend to adopt a similar approach with my classes in September. The . history folder is a great idea.
How do you intend to assess your students?
I told them up front this was new territory and we're all learning from it. If they're overusing it I assured them they won't be dealing with a student integrity issue, but I will be giving them feedback if they're clearly over-reliant and if they continue to do so, it could eventually impact the grade on later assignments.
I'm hoping they learn to use it as a tool instead of trying to offload all cognition to it.
This is a CS course targeted at non-majors, so thankfully the "fundamentals" aren't as critical as the overall themes and general skills.
"if they're clearly over-reliant and if they continue to do so, it could eventually impact the grade on later assignments."
Not sure if this is fair. Consider some ofher poor study tool. Would you dock their grade for using externally curated flash cards too much?
Love it! I think the power of LLMs to acquire new skills and deepen the knowdledge is underestimated.
When used correctly, they offer a huge advantage over those who don't use them and think they understand but remain superficial. I encourage you to ask even the most obvious questions.
I always thought it might be good to reframe educational questions in the age of LLMs. Something like "An apple falls on Sir Isaac Newton's head and a photon passes the event horizon of a black hole. Get from A to B." This way, it becomes physically impossible to just query for an answer because the student themselves will have to show the line of reasoning, ideally presenting to the class how they got from A to B.
For those using Claude Code, I recommend Learning mode to instruct Claude to walk you through implementing the solution yourself rather than doing it for you. It’s very helpful when diving into a new domain, and helps build lower level intuition.
To enable it, run /config > output styles > Learning
Learning mode has been a huge help for me, it quickly became my favorite way to learn. I ended up created a “Coaching Mode” output style that took some of the learning concepts like stubbing todos for the users and added other intructions that better fit how I learn
This sounds neat - do you have this publicly available?
I think it’s a part of official Claude code.
This seems somewhat sensible to me - the genie _is_ out of the bottle, and students absolutely will use AI agents to finish assignments without learning a thing, but there is some value to showing how agents can be used as teaching tools and what healthy use _can_ look like
Same issue as with cliffnotes. Easy way out means the easy way will be taken. Unless, you actually design a decent assignment or exam. In person essays or exams, heavily weighted, you are simply screwed if you didn't study the old fashioned way. A couple of my more serious classes were like this: no homework, no projects, entire grade based on 3 exams. That put the fear of whatever diety you subscribe to into you like nothing else to study hard and not fall behind. One bad exam you can't really come back from. Better luck next year when you retake it. Or, you dig in like hell.
3 tests was already better than the traditional Spanish university class: 1 exam. which is probably written by the department head, not your teacher, and he isn't in any way interested in a high pass rate. Failing 90% of the class might even be positive for them. At that point classes aren't even important: You purchase the tests from the last 10 years, and then you have a prayer of knowing what the bar might be this year.
Teaching, fairness and measuring student performance might seem like similar goals, but it's just so very easy to make sure you succeed at one while messing up the others.
> Teaching, fairness and measuring student performance might seem like similar goals
What? Teaching and measurement are very different goals. The whole point of teaching is to mess with measurements.
This automatically means, by the way, that a huge conflict of interest exists whenever the same party is supposed to be responsible for both instruction and measurement. That's why assessment in a traditional Spanish university is out of the hands of the professor. We should aspire to be more like them.
I tested out of all but the last required Spanish class so I probably skipped over some early stuff and avoided the deeper stuff. But at that level I remember we'd do oral exams with the TA 1 on 1 maybe 15 mins in the hallway. I forget the logistics of it all now. I remember making presentations and class participation in spanish being important. I can't remember how the written exams went.
Pretty sure op is talking about university in Spain for other subjects, not Spanish class at university.
suuper high-value comment
Sorry, no refunds.
The insidious thing here is that students can think they're studying and practicing by chatting with an AI "tutor", which shifts them into a passive observation role that's no better than watching YouTube videos.
It turns out that it's much less memorable if you're too "clear and helpful", so nothing helpful sticks for students. A good teacher (tutor, educator, pick a word) challenges students and makes them the right amount of uncomfortable.
These resources often suck for the college major level anyhow. Youtube and such is all dumbed down usually. Or if it isn't dumbed down, you risk studying beyond the scope of the lecture. Every class I took, the professor would say something like "anything in lecture could end up on the exam." And indeed, every exam was comprised of something that came from the slides, and nothing that didn't come from the slides. Even if there was an assigned textbook, there would be so much skipped over, either subtopics or entire chapters. Emphasis can vary by lecturer for the same class as well. The class might fall behind or run ahead of whatever is outlined on the syllabus; that is more an aspirational goal than a solid plan of what to expect.
The best tutor, as always, is your TA or professor, during office hours that you already pay for in tuition. No one takes advantage though, well the students who were getting As already do just to validate their understanding. The students who really ought to go never go.
I'm a college (physics) professor, and last semester specifically had a huge shift in student behavior. In introductory courses, students basically stopped coming to help sessions.
I give a substantial amount of extra credit for attending regular help sessions which yielded about 30% help session conversion in past semesters. This term it dropped below 5%, and those few who came were the ones who were high B/ low A students. The solid A students don't come because they don't need to. The low B and lower students didn't come because they thought they didn't need to? It's unclear, but clearly something changed.
Students performing in the mid-B and up range weren't affected, but below that? The bottom dropped out. Students who should have earned B's earned C's. Students who could have earned C's... didn't.
I used to love classes like that and now that I’m a few decades beyond university, I realize they helped me the most. That do it properly now or everything is going to suck is a good prep for the real world.
They're only cheating themselves in a world that increasingly cares about knowledge (market trend of seniors being preferable hires to fresh out of school juniors) and not the piece of paper that "proved" you had such knowledge.
I agree with you that they are cheating themselves. Unfortunately, a bunch of 18-22 year olds also don't tend to have the maturity to realize that fact. I imagine that the university is trying to nudge them to do the courses in a way that helps themselves because they know otherwise the students won't be wise enough to do that.
These students are making a tradeoff between an abstract notion of "cheating themselves" and a very concrete notion of "having a worse GPA". The second one translates obviously and directly to job prospects.
> The second one translates obviously and directly to job prospects.
I was never asked for my GPA when I first went into the industry. I never put it on my resume and it never even came up. After a decade or so of interviewing candidates, many of whom were recent graduates, I don’t recall any that put their GPA on their resume, and I obviously never cared about it either. Maybe the recruiters did, I dunno.
I’m a data point of one, and I could be wrong, but I don’t think it’s clear and obvious at all that GPA translates to job prospects. At least not in tech.
they should learn as much as they can + cheat for optimal results
Not really. If they got admitted to Stanford to begin with, they are smart enough to succeed if they put in the work. So what they are actually trading off against is "I don't want to do the work", which is far less defensible than your reading of the situation.
You are conflating some abstract notion of "succeed"ing (what does that even mean, how can I measure it to two decimal places?) with a very concrete notion of "do this simple thing or get a worse GPA".
Agreed. I don't know how they plan to enforce this but this is way better than some other articles that have come up indicating educational bans on AI use, in-person proctoring, verbal assessments, pen and paper exams etc. This is the first attempt at an approach I've seen that doesn't seek to isolate education from reality; students that are effective at integrating AI into their work and actually understand what they're doing are going to get jobs, which is ultimately the goal of school.
I think these are based on the one I posted a while back:
https://gist.github.com/1cg/a6c6f2276a1fe5ee172282580a44a7ac
Yes absolutely! We linked your version inside the extended AI policy document, but forgot to add it to our website cs336.stanford.edu
that's awesome i'm glad you guys found it useful
please let me know if you make improvements, I'd love to iterate on it
Congrats. This seems like a great prompt to ensure a useful default experience. People should not confuse this with "anti cheating" and instead helping people learn how to learn.
Do you have further insights on AI and education since?
Seems like a pretty close copy of Carson's (of HTMX fame) agent.md from 5 months ago
https://gist.github.com/1cg/a6c6f2276a1fe5ee172282580a44a7ac
They reference the gist of 1cg in the honor code section of CS336.
https://cs336.stanford.edu/
This would be an interesting approach if the course supplied a custom Harness (perhaps in place of a textbook) and this was part of the instruction set inside of it. As a standalone thing you ask students to import into their agent, seems unlikely to work.
To be fair, shipping these guidelines as AGENTS.md/CLAUDE.md in the repo that contains the assignments will make it so that agents will pick this up without needing students to opt in explicitly. Seems like a reasonable first step to me
Hah, I like that these are presented as a CLAUDE.md.
(They have the same content duplicated in an AGENTS.md as well - I really wish Anthropic would hurry up and teach Claude Code to check for that file too.)
We symlink AGENTS.md and CLAUDE.md to a single file in our repo
You can also include other md files like AGENTS.md in CLAUDE.md:
They won't, because forcing the file to be named after their product is an intentional marketing choice. Free advertising on every repo that has it.
They won’t until the winds change, and people start talking about the tradeoffs of Claude Code vs any of the other thousand good quality agent harnesses out there that recognize AGENTS.md
Opencode is good enough for most workflows IME, even if it doesn’t have the kitchen sink of features as cc
> I really wish Anthropic would hurry up and teach Claude Code to check for that file too.
Surely such a trivial feature could be implemented in seconds using e.g. Claude? It's not about them not "hurrying up".
I wouldn't hold my breath.
i think people out of school underestimate the power of exams. there's a huge difference in classes recently between ones with and without exams. if there is an exam, people are way more likely to study and therefore actually learn
This is such a realistic balance between completely banning coding agents and embracing the spirit of higher education
I agree. Learning to code from scratch today would be difficult to say the least. The scar tissue accrued from debugging a compilable typo, a misplaced comma or parenthesis teaches something that's hard to recreate - but replacing it with durable learning that won't age-out is a definite win.
Why let the robot speak for you?
I'd be interested to hear which sequence of tokens triggered your "robot authored" pattern match? You may want to tune the mechanism.
no tuning necessary you got caught with your pants down. Now write a poem on the efficacy of hammers in solving headaches.
+2 to the unnameable pattern match
This is interesting. I don't know how the AI agent guidelines will be enforced because there will always be a model outside the curriculum that a student can use to bypass the guidelines. Encouraging academic integrity is useful but requires the student to buy into the idea that they are paying for an education, not a diploma. This is a tough problem and I have been wondering how CS departments are incorporating AI into the curriculum while encouraging appropriate use in a learning environment.
I think the answer to "how will AI agent guidelines be enforced" is that they won't be because they can't be, at least not directly.
This doesn't mean that this approach doesn't have value though. I think it very much does.
One way to indirectly enforce use of the AI agent guidelines is via an oral examination where the instructor and student look over their work together and talk about it. Students who have genuinely tried to learn and used AI as a learning tool via the agent guidelines should do a lot better in an oral exam than students who have used AI as a solution generator.
I adopted the oral exam (without agent guidelines) for a course i teach in the academic year just gone, it worked pretty well. Next term I intend to include the agent guidelines to give them clearer guardrails. Still ultimately optional, but if students choose to ignore them it's gonna be pretty obvious during our conversation.
Well, no amount of instructions would work if the student has no intention to learn anything.
Stanford has an honour code. Meant no oversight even during exams. Worked surprisingly well when I was there. The flipside is, if you’re ever caught cheating, there are no second chances.
I imagine this applies here, too, if they want to enforce it strictly.
>Worked surprisingly well when I was there.
How could you tell? I proctored. People cheat pretty frequently and other students are none the wiser. It really takes like 4 proctors if you want to do it right. Even then I'm sure the clever ones are slipping through. These were scantron though. Short response/essay format you'd be screwed if you didn't know your stuff.
Once upon a time. The modern cheating student does the following:
- Quick cell phone photograph of the page, phone placed in lap
- OCR + AI = answer
- Glance down, copy
- Repeat
For a more surreptitious variant, use the front-facing camera against the bottom of the page, then flip the page. Cell phone can remain in lap.
Marc Tessier-Lavigne was Stanford's president from 2016 to 2023. Not sure if the honor code means anything nowadays.
At least he had the honor to resign when caught, instead of doubling down on his innocence.
Did you know Theo Baker? He's got a book out about Stanford that says differently.
>Cheating has become omnipresent. I don’t know a single person who hasn’t used A.I. to get through some assignment in college, yet the school was at first slow to realize how widespread this would become. As freshman year went on, some professors suggested that the “nuclear option” might be called for: allowing faculty to proctor in-person exams, a practice banned at the university for over a century to demonstrate “confidence in the honor” of students.
snip
>In junior year, 49 percent of the 849 computer science majors who responded to an annual campus survey said they would rather cheat on an exam than fail.
https://www.nytimes.com/2026/05/17/opinion/chatgpt-ai-colleg...
You mean it worked well for cheaters right? The more I learn about these "honor codes" the more I realize how sheltered these American elites have become.
No, I mean it generally worked well. I can't really say how it worked in undergrad because grad and undergrad school were so separate. I can only talk about grad school. I was really surprised myself how well it worked because I didn't know this coming from my university in Germany, but it really did work, at least in every exam I saw. And I don't consider myself overly naive. I guess it has to do with the fact that people who get admitted to Stanford grad school have already proven a certain work ethic and really want to do the exams to learn more. Your final grade isn't as important when you graduate from Stanford; it only really matters if you want to do a PhD, otherwise it's borderline irrelevant. "I went to Stanford" is all you need for your CV. So I didn't feel a lot of pressure being there of always having to have the best grades, it was more about using your very expensive time there wisely to learn as much as you can, and I felt my peers were the same.
Now, I'm not saying every place can be like that, I'm just trying to explain why at this particular university, the honor code is a reasonable policy that may work perfectly well on policing AI in exams. You can't copy that to other institutions, but it answers how they do it here.
In an ideal world guidelines should be suggestions for those willing to make the best of the course and improve as a person and professional. However a degree has real world value and repercussions, so enabling someone incompetent to do a dangerous job can put innocent lives in jeopardy. It's tough, but I hope in time we learn how to live with this new tech.
When reading this I'm reminded of Charles Petzold's 2005 article "Does Visual Studio Rot The Mind?". https://www.charlespetzold.com/etcetera/DoesVisualStudioRotT...
I know who Petzold is but I can't remember if I read this at the time. I do love it though. I was thinking a lot about generated code in Visual Studio a couple years after this article.
It's kinda funny to think about various forms of code generation. From compilers to IDEs to parser generators to, now, LLMs. Even several higher level languages that compile to lower level languages are generative, essentially.
Still not a fan of LLMs, but it's always a good to remember that the concept isn't entirely new or unique.
Calling an entity that's forbidden from acting on your behalf an "agent" seems funny but maybe it's meant as a catch-all term. Their use of "assistant" seems better for that purpose.
yeah I don't think that's going to work - it would be kind of like "we're releasing model answers to all assignments but please only use them as a teaching aid and don't copy from them"
best to
a) adapt assignments so that agents are bad at producing solutions
b) have more scenarios where students have to do things in controlled environments. Universities managed to adapt to 'any solution you need is readily available online' so I don't think it will be that different to have several times a month/year where students have to go into a room with nothing but pencil and paper to prove what knowledge they have vs what they have the skills to access
Assignments the agent is bad at seems like a losing battle. Just need to base the mark off the in person test, maybe keep 20-30% to encourage people to still do the assignments. Some will cheat but it will just be hurting them for the test.
Laptop without internet access, sure. Pencil and paper? that is brutal :)
20 years ago this was not unheard of. One exam we had to translate C code to assembly for one of the exercises, convert to numbers to IEEE754 representations and similar, both tasks where access to a laptop would make it possible to cheat. Also had to modify some small computer architecture diagrams if I recall correctly.
For the linear algebra written exam it didn’t work as if you learned to solve the 4 previous years exams, you could be sure most of it was familiar, so you could just prepare for a few standard exercises without really understanding the content.
Our advanced algorithm course used a bit of a combination, with a project take home exam (knapsack like optimization problem - competing for the fastest implementation) combined with a two hour written exam with multiple choice answers, but again only with books, pencil and paper to get to the right answer. This I think could work today, having both the opened ended project + some multiple choice with pencil/paper.
Pencil and paper - that's the classical way a lot of developers from western europe were taught to code. Back to the basics
I did most of my CS class tests this way within the last year. It’s not that bad because prof doesn’t care about syntax so much (unless that’s what we’re testing on of course) and details, but wanting instead to make sure we understand broader concepts.
I agree it's not a complete solution. But as those don't exist as a society we are looking for a step function in the right direction. and IMO this is one such step. You may disagree that it's not a very large step, but I would argue it's still in the right direction therefore it is neccesary, especially in education space, and I'm happy to see someone publishing at attempt.
"Guidelines" eh?
Reminds me of this: https://www.youtube.com/watch?v=k9ojK9Q_ARE
This is a very good baseline for future courses to build on, there would always be a group that wants to jailbreak this and thats okay, but have baseline agent support learning is needed in this ai first world.
Jailbreaking isn’t even needed - you can just modify the file
I was probably thinking of a future where future devices from education institutions would have these preloaded with a non modifiable version of agents guardrails tuned for learning...
It's a good idea, at the very least it communicates intent to students, but couldn't students just modify CLAUDE.md and not check in their changes to that?
>Don't: Run bash commands
This seems unreasonable to me. One of the best uses of AI is that you can just tell your computer what to do in natural language and it does it. Running bash commands isn't part of the education, its busy work.
in the context of learning I think it's good to execute yourself then see what happens immediately, once you allow the agent to execute things, it tends to run several follow up commands when needed, an example from top of my head would be running a server that fails because the port is already in use, the agent will easily find the port and decide to kill it or not and then re-run the command, but if you run the command and see the error, then you get the chance to learn what's going on and how to fix it. You can still use the LLM to read the error and explain you why this happens, according to this guidelines.
Why wouldn't it be a part of the education?
Good practice for university of how to use agents in courses rather than forbidding those without distinction. Official AGENTS.md may be a new pattern in university courses.
Interesting. It makes me think of the idea of fighting piracy by providing a solid legal alternative through streaming platforms, etc.
I really like this. I'm currently doing a part time BSc and my current module explicitly allows AI usage as long as you 'cite it'. The guidelines are out of date in that they assume you are using a chatbot and not a coding harness. The temptation to have claude write all my pandas code has become too difficult for my self control, but at the same time I actively feel my education is suffering from using it. As I write my final paper I am thankful that I at least despise AI writing too much to use it for the actual marked assessment, but I still feel that I have cheated myself out of part of my education and probably wasted a lot of time going fast in the wrong direction because generating data frames, graphs, statistics, etc. is just so easy with claude
> What AI Agents SHOULD NOT Do
> * Run bash commands
Students who prefer to use zsh keep winning.
zsh is fine, but I prefer fish. It has a funnier name!
I just took a C1 Spanish class and it had almost exactly the same instructions. Hmmm and I do not wonder why...
I'm definitely going to use a variation of it for learning new programming languages.
Even though it seems radical, I think the right approach is to simply allow the students to use AI to its full potential, to generate answers, code, whatever.
The onus should be on the instructor to make sure that the student ends up actually understanding and being able to code/solve problems that they pose without using coding agents.
Why? Because:
1. this is exactly what is going on in the real world. People are able to get AI to do whatever the hell they want, but the ones who just use it lazily end up with huge cognitive debts and codebases riddled with opaque bugs that they do not understand whatsoever. If we prevent students from confronting this temptation, then we are sort of coddling or shielding them from it, and not really preparing them to avoid pitfalls of this type.
2. you can actually learn a LOT by being given the answer, if you actually care to learn. i personally think it's pretty fucking lame to handicap a student's ability to learn in an attempt to prevent lazy abuse. isn't the whole point of a grade to measure how well you understand things? can't you have pop quizzes, assignments on a computer with no agent use, written tests, etc etc. to catch the lazy abusers? this is an unnecessary prevention of lazy abuse that unfairly handicaps learning
> you can actually learn a LOT by being given the answer, if you actually care to learn.
Even if you "actually care to learn", this is a huge mental shortcut and you're deceiving yourself if you think deep learning is happening from looking at the answer.
On top of that, the pressures to just finish the coursework and move on to your other homework due tomorrow seems pretty high. Your suggestion means we're no longer coddling/shielding students, but we also aren't actively helping them, are we?
Not from simply looking at the answer. From knowing the answer and reverse-engineering or understanding how to arrive at that answer in the first place. It's not always the best way of learning, but it definitely is a great way to learn if you care to actually understand why it is the answer and how you would have arrived at it.
> Your suggestion means we're no longer coddling/shielding students, but we also aren't actively helping them, are we?
My suggestion is just the former, it doesn't imply the latter.
My understanding is that research shows more learning happens when the student has to struggle with the material to solve problems and answer questions.
Stanford is a research university. The student should have full responsibility for learning outcomes. The university will provide support and opportunities to the extent its resources allow, but it's up to the student to choose if they want to take advantage of that. Those who need a more guided approach to learning can always go to teaching-oriented universities or find a personal tutor.
That's a major reason why employers have traditionally valued degrees from research universities, even if they are not particularly highly ranked. Being able to thrive in an environment like that shows a degree of independence and initiative.
And, yes students are going to follow it....
Finally some sanity when it comes to AI.
I am questioning the experience level of the TA if they think that AGENTS.md is going to work.
As an educator you have to trust that most students are there to learn, not to cheat. Some will cheat, of course, and you can try to stop it or detect it but it devolves into whack-a-mole and the cheater is probably more motivated than the instructor.
Once they have graduated they will be on the job using LLMs and agents all day long, and their employers not only won't care they will be encouraging or requiring it.
I always wonder why there is such course. Using agent ai coding tool is trivial.
When calorie dense food and gas powered vehicles came on the scene, humans (generally) got fat and out of shape. "Why eat that salad and go for a run?" one might say, "This cheesecake tastes much better and I can just drive wherever I want to go."
Getting fat is one thing, but getting stupid is another, and I really fear for the future of humanity when it becomes so easy to sidestep the processes that let us actually learn and grow because stuff like "using agent ai coding is trivial".
There's different skills at play, and they're both as valuable as each other.
They shouldn't be thrown into a big soup with shaky aims.
We still - as a society - manage to have PE and driving as different subjects. The same can equally apply here.
using a coding tool is trivial, correct. so is using a microwave oven or its larger counter-parts. you need a certain level of person to know if what came out of it was Michelin-star or not and I do not think Stanford is going for Hot Pockets here.
this is like telling a student driver they have to push their car by hand instead of putting their foot on the pedal.
Is this all an elite educational institution with about $50bil in assets could muster, lol? This is completely and utterly unenforceable, and such, worthless.
There really needs to be diversity in delivery styles for different modules of courses according to their aims, with 'ai access' as a key variable.
If AI is allowed, it should be based on $x of usage/student, with an audit trail to prove no external funding was used, and module aims based on using AI to the max while conserving token use. Like actually creating wild, ambitious shit which takes cutting edge services to the max.
If AI is not allowed for a module, then it really needs to go back to the old skool, with handwritten exams, or coding using old machines and textbooks. Some skills, techniques, etc, really do need drilling.
Straddling the middle will help nobody, result in accusations, increase the burden on teaching staff, and result in a course without a realistic focus.
Though I guess if you're a big brand university, you don't really need to care about innovating. The money will keep pouring in. The whole further education sector is in dire need of a shake up.
I don’t really know why this is getting downvoted. It’s clear that higher education is degrading because of easy to reach AI solutions that have no type of penalties for use.
During my undergrad it was normal to see people refer to Chegg solutions to get their answers, or as a friend for theirs.
Maybe there’s a reason my first CS professor wrote out Java code with pencil and paper I guess.
What's the estimated RoI on doing this course?
I am really baffled by the comments in the spirit of "this is unenforceable, and therefore worthless".
I bet most people would not steal even if they knew they could get away with it.
Students are struggling to get work after graduating because they're dropped into a competitive environment. Ideals aren't enough to get jobs in the current environment.
Universities should be places which are at the bleeding edge of development and providing society with the best new ideas/tech, etc has to offer. Junior workers should be hotbeds of exciting talent which have the ability to revolutionise industries.
By creating such milquetoast environments to study in, which are seemingly scared or unable to prepare people for the future, students are being done a disservice.
Far too many people are far too comfortable with their cushty positions, and it's not doing the youth any favours.
Im confused, are you suggesting students using AI to do their assignments for them and have them learn nothing will benefit them more or less in the future when they entire a competitive environment?
I feel it's either:
1. Everyone already employed is "cheating" and not using fundamentals. Therefore to prepare them for the workforce them must just learn to "cheat" effectively... at the expense of the "ideals" (read: direct skills or knowledge.)
2. "Milquetoast environments" -- A general "tough love" trope, but I'm unclear on how this tough-school will somehow match the unique issues of the tough-work. Mix incompatible types of difficulty and people are just worse-off.
For that matter, why not flip the argument around? If the future competition everyone slinging stuff through LLM slop all day, perhaps ensuring students have fundamental skills to differentiate themselves becomes more important, rather than less.
Funny you should say that. This is about Stanford:
>In our tech-enabled, newly A.I.-powered world, students were increasingly fudging just about everything. They would embezzle dorm funds to spend on their friends and lie about having Covid to get the UberEats credits that the school offered to those in quarantine. Some kids I knew published a paper that claimed a groundbreaking new A.I. advancement. Online sleuths quickly pointed out that it appeared to be just a stolen Chinese model, to which the two Stanford co-authors responded by blaming the plagiarism on the third author.
>In junior year, 49 percent of the 849 computer science majors who responded to an annual campus survey said they would rather cheat on an exam than fail. A friend of mine captured the school’s ethos while we were discussing the tech hardware and other items our student club neglected to return to corporate sponsors. It was all, I recall her saying, “just a little bit of fraud.”
https://www.nytimes.com/2026/05/17/opinion/chatgpt-ai-colleg...
I mean, some would say that's how this whole thing got started.
Related:
CS336: Language Modeling from Scratch
https://news.ycombinator.com/item?id=48357075
This is ridiculous. The genie is not going to go back into the bottle. This is the equivalent of "you wouldn't download a car". (Yes, we would.)
The solution is to scale the difficulty of the objective measures. Expect far more from students.
Reorient the university around physical laboratories and timesharing resources no single student could afford. It's already like this in many STEM disciplines.
More internships, more networking, more large projects. Less trivial tests of knowledge and credentialism.
> The solution is to scale the difficulty of the objective measures. Expect far more from students.
So now students are _required_ to use agents? That's a bit crazy
Would universities be paying the token cost? Is that you Dario?
> Would universities be paying the token cost? Is that you Dario?
A Claude subscription is like 1/5th the cost of one textbook.
I mean you kinda have to own a computer to be in this industry at all. Back in the 90s there was a clear distinction between someone who had a computer and someone who didn't in so far as what they could do. Maybe a subscription might be a requirement soon. Overall it's much less money than what my parents had to spend on my computer in the 90s. In inflation adjusted dollars they spent over 5k on just my tech before I was even 13 years old.
Stanford is industry compliant and teaches the youth to outsource their thinking to BigTech. No surprise here, the donors will be happy.
The "but we do not let them write code directly" is a smoke screen to appease critics and parents. Yes, hello parents, you pay for your offspring to become a mindless industry tool.
good intention but useless let's be real
Seeing my own kids (teens) go through some of this, I'm becoming slightly less pessimistic as it all shakes out. Among their peer groups there does seem to be an opinion forming that sure, anyone can just ask ChatGPT for quick answers on assignments, but actually knowing stuff is a bit of a "flex" that's respected.
300 years ago when I was in high school I had a friend choose to go the HVAC trade school route instead of college. He chose the hardest school in the country where they did most things manually so that students understood how things work. It removed the "magic" some tools provide. I was pretty impressed he was wise enough to do that. He's exceptional at his job by the way.
I think we have a tendency to think the worst of your people. They frequently surprise me though.
Teens also fucking hate AI, on a cultural-ideological level.
I think they may hate what it may be doing to their future outlooks but they use it as much as they do social media
Yeah, that's exactly why I added the second clause.
Nevertheless. The peer pressure is to be anti-AI.
They use it and hate it at the same time.
I hate it sometimes too. But it's sort of like being mad at math.
Pangram reports as 100% AI generated. Makes sense for a README, but a tad bit funny given that their students must hand-write code
As an employer, I want AI to be fully allowed for assignments, and the assignments to be made trickier to compensate.
Let's train people to use all the tools available to solve the hardest problems, rather than solving toy problems with a slide rule.
I don't think you will achieve much that way. Suppose you pair me with a plumber, an area I know almost nothing about. The plumber is not able to finish anything 'trickier' because they have an ignoramus looking over their shoulder. Maybe I can learn some things by watching, having things explained to me as they work. On the other hand if I just walk away there's no way to tell from the final product. You gotta learn the fundamentals to at least a comparable level to be able to contribute. Same reason you couldn't just google everything on your phone in calculus.
You have to balance that with teaching the skills needed to understand the domain sufficiently to take over when the model gets things wrong.
What a ridiculous take. Just because solutions exist in the llm training data, doesn't make these problems 'toy' or 'easy'. The human 'engineering hardness' scale doesn't align with what an llm can and can't do.
As an employer, I want education to be robust from the ground up, not turn uni into an attempt to bootcamp whatever is hot today.
I don't think a 4 year postsecondary education is enough time to make a developer that can hit the ground running. Not if it's 100% of class time on CS theory. Nor if it were 4 years of vocational training and labwork that leaned heavy into AI. Nor some mix. We train on the job heavily, it's just not possible to fit everything into the sausage grinder.
So why not throw in some mandatory non-major electives? Take the time to do stuff that frustrates people who want uni to be a certificate mill. I don't care if green employees are experts at the exact narrow set of tools I use. I want them to be good at learning, and to have gotten most of the standard CS topics out of the way.