You may want to look into PDDL and it’s Planners / The field of Automated Planning and Scheduling.
It shares it’s Ancessor “STRIPS” with GOAP. PDDL won’t be your solution because it doesn’t support the concept of Probability, but the working is very similar.
I’m doing my master thesis in that field and can brief you there:
Typically states are only booleans, but some planners support numerics too. You then have actions consisting of preconditions and effects.
Now as you figured the space is too big to keep in memory, so the branches of the built graph are lazily evaluated.
Then the task is “just” to find an optimal or any (depends) solution on a graph.
As you also figured there is the problem of loops, but there is a solution to that.
Naive Algorithms just keep track of already visited nodes, but as you also figured, this can be bypassed when going bigger loops.
When considering the optimal solution and having defined action costs, algoritms such as an heuristic guided A* won’t get stuck, because it will realize by the heuristic that the optimal solution becomes unreachable when looping.
There is however yet another solution and that is clever preconditions.
For instance my action
goto ?a ?b would have two preconditions:
at ?a and
not at ?b.
This is a problem for a lot of planners in this example, but this may or may not be true for GOAP.
Either way this prevents looping effectively, as the action cannot be taken.
For the example of a pickup-loop, one could have an inventory-size or weight that is being tracked. This even leads into the fancy part of the AI where it has to consider how many ammo it will pickup, if it knows it WILL pickup keys.
FWIW I think I also base my stuff on Behavior Trees, because it’s superior to FSMs and others while being capable of impleneting everything as Behavior.