View Forum Topic

Forums
CS4635
Reinforcement Learning, constraint systems, and tradition...

Author	Message
rabbott Posts: 1649	Posted 04:11 Apr 28, 2018 \| I woke up in the middle of the night thinking about the relationship between RL, traditional AL, and constraint systems of the sort that Weronika was talking about. (I'll call them CS.) So here are some thoughts and questions. (Since it's the middle of the night, I hope it makes sense.) The underlying mechanism for almost all of AI is search. Isn't that true also of RL and CS? In both RL and CS one is searching for a path from a current state to a goal state. In all three cases, one interacts with an incompletely known environment and uses information extracted from the environment to guide the search. One of the strengths of CS systems is that they give you ways to build environments with somewhat sophisticated structures. They then use heuristics based on those structures to guide the search. In AI the standard generic approach to search is the A* algorithm. In RL the Q-learning function serves as an alternative to what in traditional A* is generally a much simpler function. But they both estimate where you are with respect to the goal. From a high enough perspective aren't these all essentially the same thing? CS probably doesn't have a similar estimator for how far you are from the goal, but it uses what it knows about the possible structures in the environment to guide the search. If the preceding makes sense, you should be able to use any of these three (AI, RL, or CS) to solve problems the others are used for. So, for example, how would one use RL to solve the state transition problem Weronika discussed? How would you use either RL or CS to solve a shortest path problem? (The maze-like example I showed a couple of weeks ago uses RL to find a shortest path from a starting point to a goal.) How would you use CS to find the best Tic-Tac-Toe moves? (AI uses minimax and alpha-beta pruning because it knows that the environment is a game tree. Therefore it doesn't have to relay on a more generic A* function.) How about the 15 (or 8) puzzle or the 8-queens problem? These are not overly complex or sophisticated ideas. You would think people in these fields would have explored them by now. What is the literature on this? A side note for the people looking into RL. General Game Planing explores ways to play games with unknown rules. One interacts with an environment in a way similar to RL. (I think the use of Monte Carlo Tree search to estimate where one is in the state space was first applied in this context.) You might look at its website for some interesting games. If you develop an RL approach to Tic-Tac-Toe, can you generalize it to Bidding Tic-Tac-Toe? (I originally saw it on the GGP site, but I can't find it now.) Last edited by rabbott at 12:25 Apr 28, 2018.
wcwir Posts: 75	Posted 11:43 Apr 28, 2018 \| I just read Dr. Abbott's post and my brain hasn't had time to do the kind of thinking about it that occurs in the background (probably the best kind of thinking)... So what I have to contribute right now is "off the top of my head" and may be a useless tangent - but I promise you will at least get a chuckle out of it if you follow a link at the end. My second thought was (I'm popping thoughts off the stack, so the first thought later): in my exploration of specifying constraints and using automatic verifiers, the goal seems to be to make sure that the system you are designing doesn't surprise you - that you know exactly what it does, and how it does it. Sure, you want to ensure that eventually your system does something useful (liveness property). But when you build your system this way you, you limit it from the get go. It takes the wonder out of the act of creation (and perhaps that's why formal methods are not popular... they are like a vegetable that is good for you but tastes bad). So when for my presentation I chose the Die Hard example, I thought, OK, this isn't a good example of how a system engineer would use TLA+, but it is fun and simple, and since I'm dealing with subject matter that makes people yawn, I'll use that one. And the reason I thought it wasn't a good example was that the spec didn't really describe much of a system - it just mathematically modeled things that happened to two jugs as you pour water from one to the other. And the solution to the underlying problem came from what to me looked like "misuse" of the system: you want to end up with 4 gallons in the big jug, but you specify "there are never exactly 4 liters in the the big jug" as a system invariant, and the model checker looks at all that can be done with the two jugs and says "you're wrong! here is a sequence of steps that leads to the big jug holding four gallons, the state you said you never wanted to happen!" To me, this somehow seemed to violate what I thought was the "spirit" of formal methods of system design. But after reading Dr. Abbott's post, I see it differently. We did, after all, get a solution to the problem. It's just that we got it from the interaction of the model checker and the pretty dumb "system" we specified! The model checker is the part that really did the work. And why not use this tool this way to solve the kind of problems we solve in AI or RL? Maybe in some of the problems (like go) the state space of the system is so large that the model checker would take longer than other approaches (or even longer than our lifespan...). But if it can handle checking every possible state of a complex distributed systems like AWS, it should definitely handle tick-tack-toe. So if this approach hasn't been used this way, maybe is because not very many people are open to using formal methods, and those who do aren't particularly creative? (Or maybe it has, and there is literature on it.) OK, now for the promised chuckle... My first thought when I read Dr. Abbott's post and saw "AI" and "constraints" was this: https://twitter.com/jeffclune/status/973605950266331138 I never went on to read about it, but on the face of it, it does seem similar to the Die Hard example: "betcha you can't use a 5 gallon jug and a 3 gallon jug to pour exactly 4 gallons into the 5 gallon jug" TLC model checker: "challenge accepted!"
rabbott Posts: 1649	Posted 12:40 Apr 28, 2018 \| Jeff Clune does very interesting work. He was one of the first to develop "adversarial images" to fool image recognition neural nets. In this example, you take an image of a panda, add noise, and depending on the noise added the neural net thinks it's an image of either a shark or a goldfish. The actual resultant picture looks to us very much like the original.