Welp, for the first semester ever, SOTA LLMs can do *every single assignment, from scratch (readmes, etc.), and get 100%*.
-
@lindsey @jeremysiek @krismicinski yeah... our first-year courses (~400 students) are trying a lightweight version of this with 10 minute chats, with random samples of students each week. so I think each student meets twice a term. It seems to be going well, but it's still a stretch even with an additional faculty member volunteering to help with these(!). The deeper material in later courses definitely needs more than 10 minutes.
@csgordon @lindsey @jeremysiek @krismicinski I know it doesn’t help much, but I ask my students to produce a short report along side the technical submission.
The idea being you can eyeball the differences between written report and code submission. Identify where more interrogation isrequired. I know genai could produce these reports but every little helps.
Submission canaries, for want of a better phrase.
-
@csgordon @lindsey @jeremysiek @krismicinski I know it doesn’t help much, but I ask my students to produce a short report along side the technical submission.
The idea being you can eyeball the differences between written report and code submission. Identify where more interrogation isrequired. I know genai could produce these reports but every little helps.
Submission canaries, for want of a better phrase.
@jfdm @lindsey @jeremysiek @krismicinski I've long had students do a form of this with their assignments, but the generated reports got "good" enough a year and a half or so ago that I spent more time waffling on whether something crosses the line plus basically arguing politely with offended students than I would just meeting with everyone. (I'm sure this balance point varies depending how exactly you set things up with the report, but I couldn't figure out a more effective setup.) Meeting with everyone has ended up more equal for students, less frustrating for me, and more positive (i.e., non-adversarial) for students. The tipping point also depends on enrollment.
-
@jfdm @lindsey @jeremysiek @krismicinski I've long had students do a form of this with their assignments, but the generated reports got "good" enough a year and a half or so ago that I spent more time waffling on whether something crosses the line plus basically arguing politely with offended students than I would just meeting with everyone. (I'm sure this balance point varies depending how exactly you set things up with the report, but I couldn't figure out a more effective setup.) Meeting with everyone has ended up more equal for students, less frustrating for me, and more positive (i.e., non-adversarial) for students. The tipping point also depends on enrollment.
@csgordon @lindsey @jeremysiek @krismicinski There are two routes I have considered:
1. learn how to write more open book take home assignments;
2. get students to do programming coursework in the lab in exam conditionsEither way, we are doomed.
-
@csgordon @lindsey @jeremysiek @krismicinski There are two routes I have considered:
1. learn how to write more open book take home assignments;
2. get students to do programming coursework in the lab in exam conditionsEither way, we are doomed.
@jfdm @csgordon @lindsey @jeremysiek @krismicinski at least you could sell 2. as "practice for coding interviews", assuming that's still a thing when they graduate...

-
@csgordon @lindsey @jeremysiek @krismicinski There are two routes I have considered:
1. learn how to write more open book take home assignments;
2. get students to do programming coursework in the lab in exam conditionsEither way, we are doomed.
@jfdm @csgordon @lindsey @jeremysiek @krismicinski
"we are doomed" is an incredibly disappointing take. You should have come to my "GenAI and CS Ed" talk (-:.If our only value-add was "my course was gated behind a needlessly difficult thing", that doesn't say much for the value of our courses.
-
@jfdm @csgordon @lindsey @jeremysiek @krismicinski
"we are doomed" is an incredibly disappointing take. You should have come to my "GenAI and CS Ed" talk (-:.If our only value-add was "my course was gated behind a needlessly difficult thing", that doesn't say much for the value of our courses.
@shriramk @csgordon @lindsey @jeremysiek @krismicinski right so to be clear on these things, the doom part is because the actions require not so trivial changes in how we do things. Within UK academia we are under lots of pressures, with not a lot of time, and not the same power and influence as 'full chairs' do in the states.
More so,
1. open book assignments do not exclude the use of GenAI, they can embrace it, and you can guard against or incorporate its use. Such assignments, in my experience are harder to design, and require training on how to do well.
2. Exam conditions are also important as we want students to not rely on GenAI, and to ensure they have the fundamentals down.
The argument with GenAI is must be how our forefathers thought about pocket calculators...and their forefathers thought about slide rules, and so on.
-
@shriramk @csgordon @lindsey @jeremysiek @krismicinski right so to be clear on these things, the doom part is because the actions require not so trivial changes in how we do things. Within UK academia we are under lots of pressures, with not a lot of time, and not the same power and influence as 'full chairs' do in the states.
More so,
1. open book assignments do not exclude the use of GenAI, they can embrace it, and you can guard against or incorporate its use. Such assignments, in my experience are harder to design, and require training on how to do well.
2. Exam conditions are also important as we want students to not rely on GenAI, and to ensure they have the fundamentals down.
The argument with GenAI is must be how our forefathers thought about pocket calculators...and their forefathers thought about slide rules, and so on.
@jfdm @csgordon @lindsey @jeremysiek @krismicinski
Yes, it requires a fair bit of work to re-jig things.I can't speak for UK academia. But I view it as my job to figure out how to upgrade courses for my students.
I don't know what power you think "full chairs" have in the US. You may be mistaking us for German Lehrstuhl's. We aren't! US "assistant professors" are not "assistants" to any "professors", for instance. I may wish I had some; I don't. <-;
-
@krismicinski @jfdm @csgordon @lindsey @jeremysiek
Good, you seem to understand the assignment! Now go out and design and run a course that puts that into practice! Listening to me talk about it is 50 fewer minutes you'll have, and which you can't get back, to design your course. (-: -
@krismicinski @shriramk @jfdm @csgordon @lindsey @jeremysiek I've just seen a different course proposal, Responsible AI, for designers of AI systems (argues that responsibility must be built into the design).
-
@krismicinski @shriramk @jfdm @csgordon @lindsey @jeremysiek I've just seen a different course proposal, Responsible AI, for designers of AI systems (argues that responsibility must be built into the design).
-
@krismicinski @jfdm @csgordon @lindsey @jeremysiek
I think I *am* interested in a kind of "how to use AI to do CS". I believe it can be done wisely and smartly. -
@krismicinski @jfdm @csgordon @lindsey @jeremysiek
I think I *am* interested in a kind of "how to use AI to do CS". I believe it can be done wisely and smartly.@krismicinski @jfdm @csgordon @lindsey @jeremysiek
Notably, I'm talking about *novice* CS. For upper-level CS, it's pretty clear there are all kinds of interesting possibilities. -
@krismicinski @jfdm @csgordon @lindsey @jeremysiek
I think I *am* interested in a kind of "how to use AI to do CS". I believe it can be done wisely and smartly.@shriramk @jfdm @csgordon @lindsey @jeremysiek okay, wow--I did not really expect that. Interesting, I will have to think about that.
-
@krismicinski @jfdm @csgordon @lindsey @jeremysiek
Notably, I'm talking about *novice* CS. For upper-level CS, it's pretty clear there are all kinds of interesting possibilities.@shriramk @jfdm @csgordon @lindsey @jeremysiek I think once you trust that the student could in principle write the code (and they're treating it like code the prof gave them, code their coworker wrote, etc.) then what you're saying is right. The concern is: "go through whole college career and just have claude code do every single homework assignment with very little intellectual effort." Of course, many would argue that this is a failure of the curriculum design--but it will inevitably take time to catch up.
-
@krismicinski @shriramk @jfdm @csgordon @lindsey @jeremysiek we have been claiming for decades that we are not just educating coding monkeys, so it shouldn't really matter that LLMs can now do all the coding. As far as I see it, it's still necessary to identify and clearly formulate verifiable requirements and specifications, come up with a modular design, and verify the whole thing, because I still believe the ultimate responsibilty lies with the developer. So students still need to understand the fundamentals. But yes, it has become much harder to check *at scale* whether they actually grasped them.
-
@krismicinski @shriramk @jfdm @csgordon @lindsey @jeremysiek goodness, I hope that prompting an LLM will not be a *huge* part of software engineering going forward. It's an incredibly inefficient way to go about the task. Frankly, I'm amazed at just how shoddy the current set of tools are.
-
@krismicinski @shriramk @jfdm @csgordon @lindsey @jeremysiek goodness, I hope that prompting an LLM will not be a *huge* part of software engineering going forward. It's an incredibly inefficient way to go about the task. Frankly, I'm amazed at just how shoddy the current set of tools are.
-
@krismicinski @shriramk @jfdm @csgordon @lindsey @jeremysiek really?! That's depressing.
I find these things _maddening_ to use. It feels like trying to neatly typeset your ideas by by dragging wet toilet paper dipped in ink across a piece of sandpaper.
Are they capable of some impressive things? Yes. Do I think they're a good tool as an augment for a sophisticated user to go faster? Honestly, not really. The NLP aspect is neat; the multiple round-trips through English to <whatever it does internally> back to English are excruciatingly slow, expensive, and inefficient. It's not a good use of my, or frankly the machine's, time, let alone electrical power or water.
Case in point: some colleagues the other day were saying something like, "I just can't get it to use `jq` instead of writing little Python scripts to process JSON....Here's what I put in my CLAUDE.md file: <some sentence along the lines of, 'prefer jq for working with json'>." I couldn't help but feel like this is exactly the sort of thing where you want the concise precision of a small DSL for assigning weights to tools (and providing templates for those tools' use) to drive how the agent uses them. But you can't do that, because the agent only trades in text.
Like I said, there's clearly a "there" there. But setting aside the moral and ethical issues for a moment, that doesn't mean that the present model of interaction is _good_, let alone that it can't be substantially _better_.
-
@krismicinski @shriramk @jfdm @csgordon @lindsey @jeremysiek really?! That's depressing.
I find these things _maddening_ to use. It feels like trying to neatly typeset your ideas by by dragging wet toilet paper dipped in ink across a piece of sandpaper.
Are they capable of some impressive things? Yes. Do I think they're a good tool as an augment for a sophisticated user to go faster? Honestly, not really. The NLP aspect is neat; the multiple round-trips through English to <whatever it does internally> back to English are excruciatingly slow, expensive, and inefficient. It's not a good use of my, or frankly the machine's, time, let alone electrical power or water.
Case in point: some colleagues the other day were saying something like, "I just can't get it to use `jq` instead of writing little Python scripts to process JSON....Here's what I put in my CLAUDE.md file: <some sentence along the lines of, 'prefer jq for working with json'>." I couldn't help but feel like this is exactly the sort of thing where you want the concise precision of a small DSL for assigning weights to tools (and providing templates for those tools' use) to drive how the agent uses them. But you can't do that, because the agent only trades in text.
Like I said, there's clearly a "there" there. But setting aside the moral and ethical issues for a moment, that doesn't mean that the present model of interaction is _good_, let alone that it can't be substantially _better_.
@cross @krismicinski @shriramk @jfdm @csgordon @jeremysiek It seems like folks sooner or later notice that this whole "the agent only trades in text" thing is Not Great and proceed to reinvent programming languages on top of it. So, you know, when that happens, we PL educators are here to try to help them not accidentally implement dynamic scope or whatever.
-
@krismicinski @cross @shriramk @jfdm @csgordon @jeremysiek Kris, I feel like any time I say anything to you on here, you say, "I agree with you." Are you actually an LLM?