[jacking off motion] great 🙄
-
But it was like a breath of fresh air to finish with the schema work, then point `datamodel-code-generator` (a deterministic tool) at those files and generate corresponding Pydantic model code for the schema.
Not only was that Python code *much* easier to review (not the model's fault, JSONSchema is just very difficult for me to read) but it was just… so nice to be touching a tool whose behavior I can rely on. I didn't need to check its work very closely at all, because I know that it's applying a fixed set of well-characterized rules to generate those models from the (equally well-defined) schema.

@SnoopJ I like datamodel-code-generator, but did have a few utterly baffling sessions with it that turned out to be due to a missing deepcopy call: https://github.com/koxudaxi/datamodel-code-generator/pull/2215
(I shudder to think what nonsense an LLM might spew if it hit a tool bug like that)
-
@SnoopJ I like datamodel-code-generator, but did have a few utterly baffling sessions with it that turned out to be due to a missing deepcopy call: https://github.com/koxudaxi/datamodel-code-generator/pull/2215
(I shudder to think what nonsense an LLM might spew if it hit a tool bug like that)
@ancoghlan I imagine you've put it to much heavier-duty use than I am, but good to know!
I also shudder to think. I suppose I should find a suitable bug from my own recent past and see how long is the path from the initial report to proper characterization of the issue (if we get there at all)
-
Coming for your job
Yikes. Just the first line of output requires both March 3rd and March 5th to be Tuesday—at least, if you remember obscure facts like “Tuesday is the day after Monday” and “there are 7 days in a week.”
Truly this application has a dizzying intellect and can be completely trusted with any other modular arithmetic it might stumble across.
-
But it was like a breath of fresh air to finish with the schema work, then point `datamodel-code-generator` (a deterministic tool) at those files and generate corresponding Pydantic model code for the schema.
Not only was that Python code *much* easier to review (not the model's fault, JSONSchema is just very difficult for me to read) but it was just… so nice to be touching a tool whose behavior I can rely on. I didn't need to check its work very closely at all, because I know that it's applying a fixed set of well-characterized rules to generate those models from the (equally well-defined) schema.

gave the bullshit machine another gentle pitch, fed it the report of new bug and asked for an explanation
it did correctly point to where the originating flaw was (`data=…` instead of `json=…` in `requests`), and from where I hit pause and explored possible explanations for why that suddenly mattered (FastAPI 0.132 has a breaking change associated with the "wrong" `Content-Type` header)
then, already knowing what the problem was, asked for an *explanation* rather than "oh look at this code". took several cycles of incorrect confabulation including very explicit hints ("I am sure the Pydantic version has not changed." "FastAPI is not bounded above, check the release notes") to get to an explanation that could be called correct.
I don't know how to evaluate how much faster that got me through the fog-of-war that is Pydantic's absolutely terrible error reporting, but I do know that the number of potential pitfalls on the far side of that is… not something that inspires faith.
-
gave the bullshit machine another gentle pitch, fed it the report of new bug and asked for an explanation
it did correctly point to where the originating flaw was (`data=…` instead of `json=…` in `requests`), and from where I hit pause and explored possible explanations for why that suddenly mattered (FastAPI 0.132 has a breaking change associated with the "wrong" `Content-Type` header)
then, already knowing what the problem was, asked for an *explanation* rather than "oh look at this code". took several cycles of incorrect confabulation including very explicit hints ("I am sure the Pydantic version has not changed." "FastAPI is not bounded above, check the release notes") to get to an explanation that could be called correct.
I don't know how to evaluate how much faster that got me through the fog-of-war that is Pydantic's absolutely terrible error reporting, but I do know that the number of potential pitfalls on the far side of that is… not something that inspires faith.
this kind of use at least puts *some* limitation on the blast radius of the use
this experiment and another (a descriptive task mentioned up-thread) convinces me that there are still many pitfalls when using such a model to guide this kind of exploratory work
but at least in that case, the user of the tool is the one who is having their time wasted the most, and bullshit does not as easily make it into the codebase.
assuming the tool has been instructed not to generate any and the user has not directly subverted that, anyway. which is of course a big assumption, sufficient Enthusiasm will (does) clearly lead to this kind of subversion and lying about it
-
this kind of use at least puts *some* limitation on the blast radius of the use
this experiment and another (a descriptive task mentioned up-thread) convinces me that there are still many pitfalls when using such a model to guide this kind of exploratory work
but at least in that case, the user of the tool is the one who is having their time wasted the most, and bullshit does not as easily make it into the codebase.
assuming the tool has been instructed not to generate any and the user has not directly subverted that, anyway. which is of course a big assumption, sufficient Enthusiasm will (does) clearly lead to this kind of subversion and lying about it
To identify and explain the problem, and generate (but NOT apply) diffs for this issue (which can be fixed in -2+2, which I wrote before asking), the model consumed:
2,900,000 tokens
5 min 30 seconds of API time
12 "premium" requests -
To identify and explain the problem, and generate (but NOT apply) diffs for this issue (which can be fixed in -2+2, which I wrote before asking), the model consumed:
2,900,000 tokens
5 min 30 seconds of API time
12 "premium" requestsoh also when I asked for an explanatory comment to be added, I did have to instruct the model not to masquerade as a human author when populating the tags required by the project's comment convention
so that's great
-
gave the bullshit machine another gentle pitch, fed it the report of new bug and asked for an explanation
it did correctly point to where the originating flaw was (`data=…` instead of `json=…` in `requests`), and from where I hit pause and explored possible explanations for why that suddenly mattered (FastAPI 0.132 has a breaking change associated with the "wrong" `Content-Type` header)
then, already knowing what the problem was, asked for an *explanation* rather than "oh look at this code". took several cycles of incorrect confabulation including very explicit hints ("I am sure the Pydantic version has not changed." "FastAPI is not bounded above, check the release notes") to get to an explanation that could be called correct.
I don't know how to evaluate how much faster that got me through the fog-of-war that is Pydantic's absolutely terrible error reporting, but I do know that the number of potential pitfalls on the far side of that is… not something that inspires faith.
@SnoopJ Bit of a secret, not sure how open it is, but the models rely very heavily on error message quality in their little fixup loops -- if the error message doesn't basically spell it out, it's not like the model has, y'know, a theory of how to model computers
-
R relay@relay.mycrowd.ca shared this topic
-
@SnoopJ Bit of a secret, not sure how open it is, but the models rely very heavily on error message quality in their little fixup loops -- if the error message doesn't basically spell it out, it's not like the model has, y'know, a theory of how to model computers
@delta_vee it actually did fine with this part of the problem, one-shotted it more or less instantly. spotted the subtle difference between `{…}` and `'{…}'` in the report and correctly diagnosed that the JSON payload was not being treated as JSON.
But the working-backwards from there (`Content-Type` is wrong, request machinery is responsible, why did it break/why did it work before?) was not great.
Or do you think the error message quality for 'cascade' failures like this still matters that far into the "reasoning" process?
-
@delta_vee it actually did fine with this part of the problem, one-shotted it more or less instantly. spotted the subtle difference between `{…}` and `'{…}'` in the report and correctly diagnosed that the JSON payload was not being treated as JSON.
But the working-backwards from there (`Content-Type` is wrong, request machinery is responsible, why did it break/why did it work before?) was not great.
Or do you think the error message quality for 'cascade' failures like this still matters that far into the "reasoning" process?
@SnoopJ The error quality for cascade failures like this are even more important in terms of root cause