View Forum Topic

Forums
CS4660
Knapsack problem: speed and computational complexity

Author Message

Author	Message
rabbott Posts: 1649	Posted 10:41 Sep 26, 2019 \| A naive knapsack algorithm builds a solution as follows. It creates a solution that consists of no items. It then steps through the items in the list of items. For each it either adds that item to the tentative solution or doesn't add that item. In doing so, it doubles the number of tentative solutions under consideration. It keeps tentative solutions that need to be expanded on a queue. Every time you expand an item from the queue, two items are added back: one that adds the next item in the list of items to the tentative solution and one that doesn't add that next item. At the end, you will have considered all subsets of items. The best is the solution. This algorithm is exponential: `O(2^n)`. The dynamic programming approach is pseudo-polynomial: `O(nc)`, where `c` is the capacity. The algorithm fills a table of `c` rows and `n` columns. Why is that "pseudo"? Because the size of the table cells are not fixed. Even though `O(nc)` seems like it should be fast, it is very slow on large problems. If the table has, say, 1 billion cells, the time to fill it is `O(k * 1,000,000,000)` for some constant `k,` which may seem like it is linear, but it is linear with a very large constant factor. The branch-and-bound algorithm solves the speed problem by filtering out items that would otherwise be put into the queue. The better the filter, the faster the algorithm runs since every item that is not put into the queue eliminates an entire subtree of items. The branch-and-bound algorithm uses various filtering techniques. If you can think of additional good ones, the algorithm will be even faster. The trick is to avoid filtering out items that are needed to build the actual solution. Another technique for speeding up the branch-and-bound algorithm is to find a good prioritization of the queue, e.g., depth-first, breadth-first, LIFO; FIFO; best-first (by density, value, or something else); etc. Which prioritization scheme works best? The order matters because the earlier the algorithm can find good solutions, which can be used to filter other tentative solutions, the more items can be filtered from being put into the queue. The branch-and-bound algorithm always keeps track of the best solution found so far. In that way it can filter out items that cannot be extended to beat that solution. The initial "best" is the one with no items, which, of course, is not a very good solution. As it looks as tentative solutions, it sees better and better potential solutions. As indicated, the sooner it can find good tentative solutions the more items it can filter out. With that in mind, how about this: first run the greedy_by_density algorithm and use what it finds as the initial best for a standard branch-and-bound algorithm. Since the greedy_by_density algorithm is O(n * log(n)) (because you must sort the items by density first), this approach adds very little to the overall time. Since the greedy_by_density algorithm generally finds a good if not optimal solution, this might work well. Since the branch-and-bound algorithm on GitHub sorts the items by density before starting, this should add very little extra time. I haven't tried this. If you try it, let me know how well it works. (If you order the queue by value while using the branch-and-bound algorithm on GitHub, this might accomplish the same thing. Does it?) Be aware, though, that ideas that seem good at first glance don't always work out. Don't be surprised to be disappointed when you try this or some other new idea! For example, prioritizing the queue by value may find the greedy_by_density solution quickly, but it may not be a good way to prioritize the queue in general. Last edited by rabbott at 12:19 Sep 26, 2019.

rabbott

Posts: 1649

Posted 10:41 Sep 26, 2019 |

A naive knapsack algorithm builds a solution as follows. It creates a solution that consists of no items. It then steps through the items in the list of items. For each it either adds that item to the tentative solution or doesn't add that item. In doing so, it doubles the number of tentative solutions under consideration. It keeps tentative solutions that need to be expanded on a queue. Every time you expand an item from the queue, two items are added back: one that adds the next item in the list of items to the tentative solution and one that doesn't add that next item. At the end, you will have considered all subsets of items. The best is the solution. This algorithm is exponential: O(2^n).

The dynamic programming approach is pseudo-polynomial: O(n*c), where c is the capacity. The algorithm fills a table of c rows and n columns. Why is that "pseudo"? Because the size of the table cells are not fixed. Even though O(n*c) seems like it should be fast, it is very slow on large problems. If the table has, say, 1 billion cells, the time to fill it is O(k * 1,000,000,000) for some constant k, which may seem like it is linear, but it is linear with a very large constant factor.

The branch-and-bound algorithm solves the speed problem by filtering out items that would otherwise be put into the queue. The better the filter, the faster the algorithm runs since every item that is not put into the queue eliminates an entire subtree of items. The branch-and-bound algorithm uses various filtering techniques. If you can think of additional good ones, the algorithm will be even faster. The trick is to avoid filtering out items that are needed to build the actual solution.

Another technique for speeding up the branch-and-bound algorithm is to find a good prioritization of the queue, e.g., depth-first, breadth-first, LIFO; FIFO; best-first (by density, value, or something else); etc. Which prioritization scheme works best? The order matters because the earlier the algorithm can find good solutions, which can be used to filter other tentative solutions, the more items can be filtered from being put into the queue.

The branch-and-bound algorithm always keeps track of the best solution found so far. In that way it can filter out items that cannot be extended to beat that solution. The initial "best" is the one with no items, which, of course, is not a very good solution. As it looks as tentative solutions, it sees better and better potential solutions. As indicated, the sooner it can find good tentative solutions the more items it can filter out.

With that in mind, how about this: first run the greedy_by_density algorithm and use what it finds as the initial best for a standard branch-and-bound algorithm. Since the greedy_by_density algorithm is O(n * log(n)) (because you must sort the items by density first), this approach adds very little to the overall time. Since the greedy_by_density algorithm generally finds a good if not optimal solution, this might work well. Since the branch-and-bound algorithm on GitHub sorts the items by density before starting, this should add very little extra time. I haven't tried this. If you try it, let me know how well it works. (If you order the queue by value while using the branch-and-bound algorithm on GitHub, this might accomplish the same thing. Does it?)

Be aware, though, that ideas that seem good at first glance don't always work out. Don't be surprised to be disappointed when you try this or some other new idea! For example, prioritizing the queue by value may find the greedy_by_density solution quickly, but it may not be a good way to prioritize the queue in general.

Last edited by rabbott at 12:19 Sep 26, 2019.