reset password
Author Message
rabbott
Posts: 1649
Posted 12:20 Aug 26, 2018 |

Yesterday Luis Fisher said he thought it was possible to generate a word count using a list comprehension. I was skeptical.

Luis was right, but as far as I can tell, the only way to do it is to rely on side effects -- which is not good programming style.

 

def do_side_effect(key, d):
    d[key] = d.get(key, 0) + 1
    return (key, d[key])
    
words = ['a', 'ab', 'a', 'ad', 'sdfsf', 'a', 'sdfsf', 'ab']
counts = {}

steps = [do_side_effect(word, counts) for word in words]
print('steps:', steps)
print('counts:', counts)

=> steps: [('a', 1), ('ab', 1), ('a', 2), ('ad', 1), ('sdfsf', 1), ('a', 3), ('sdfsf', 2), ('ab', 2)]
      counts: {'a': 3, 'ab': 2, 'ad': 1, 'sdfsf': 2}

When run, the list comprehension generates a list of the steps the list comprehension takes. The actual count is stored in counts, which is updated as a side-effect of running the list comprehension, each time do_side_effect is called.

As far as the list comprehension is concerned, the job of do_side_effect is to generate a value to put into the list being constructed. The value do_side_effect returns is a pair, whose first element is the current word, and whose second element is the then-current count of that word. As a side effect, do_side_effect updates the dictionary of counts for that word. This is not the intended use of a list comprehension, but it does job.

Last edited by rabbott at 13:09 Aug 26, 2018.
jungsoolim
Posts: 38
Posted 22:17 Aug 26, 2018 |

Can we put list comprehension in the body of function?

def count_word_freq(words):
    return (dict(zip(words, (words.count(k) for k in words))))

words = ['a', 'ab', 'a', 'ad', 'sdfsf', 'a', 'sdfsf', 'ab']
print(count_word_freq(words))

If so, wouldn't this function be less efficient that the function using a loop?

(i.e.)

def word_frequencies(mylist):
    result = {}
    for k in mylist:
        result[k] = result.setdefault(k, 0) + 1
    return result

print(word_frequencies(words))

rabbott
Posts: 1649
Posted 22:51 Aug 26, 2018 |

Definitely less efficient. It's O(n^2) since you have to scan words for each word. Even if you did something like what Jay did, it's still O(n^2).

def count_word_freq(words):
    return {w:words.count(w) for w in set(words)}

words = ['a', 'ab', 'a', 'ad', 'sdfsf', 'a', 'sdfsf', 'ab']
print(count_word_freq(words))

Assuming dictionary access is O(1), the side-effect version and the for-loop version are both O(n).