reset password
Author Message
abansal
Posts: 9
Posted 01:50 Jul 21, 2009 |

S1: <P1,P2,P5,P2,P7,P3>
S2: <P1,P4,P5,P1,P6,P7>
S3: <P1,P6,P1,P4>
S4: <P5,P4,P1,P6,P7>
S5: <P5,P3>
S6: <P1,P2,P7,P3>
S7: <P2,P7>
S8: <P1,P2,P4,P1,P2,P6,P7,P3>

min_sup=3

Applying GSP Theorem:

 

C1           Support       L1

-----------------------------------

P1               7            P1

P2               4            P2

P3               4            P3

P4               4            P4

P5               4            P5

P6               4            P6

P7               6            P7

 

Please note that even though some pages exist more than once in a session their support is counted as one

 

Normally in GSP we would calculate all the subsets. Since here only certain page navigation paths are valid. Only the valid paths should be considered as candidates

 

C2           Support       L2

-----------------------------------

P1P2            3            P1P2

P1P4            2

P1P6            3            P1P6

P2P4            1

P2P5            1

P2P6            1

P2P7            3            P2P7

P3P1            0

P4P1            2

P4P5            1

P5P1            1

P5P3            1

P6P1            1

P6P7            3            P6P7

P7P1            0

P7P3            3            P7P3

 

Please note that even the paths like P1P2 that occurs twice in session S8, support has only been counted once. Though it won't effect result for this data

 

 

L2                C3                   Support       L3

-------------------------------------------------------

P1P2            P1P2P7                1

P1P6            P1P6P7                3           P1P6P7

P2P7            P2P7P3                2

P6P7            P6P7P3                1

P7P3

 

Please notice that for normal GSP, we will pruine candidates before testing them for support. But in this case we do not pruine as its a path from one to other. For example in P1P6P7 candidate set, if we do pruining , it should be pruined as the subset P1P7 is not frequent but since P1P7 doesn't matter and only transactions  from p1 to p6 and further to p7 matters so pruining is skipped.

 

Hence Frequent Paths are:

 

P1--->P2

P1--->P6     

P2--->P7

P6--->P7

P7--->P3

P1--->P6--->P7

Last edited by abansal at 02:46 Jul 21, 2009.
abansal
Posts: 9
Posted 02:19 Jul 21, 2009 |

a

Last edited by abansal at 02:46 Jul 21, 2009.
arctica82
Posts: 10
Posted 22:55 Jul 22, 2009 |

By applying GSP algorithm: 

 

C1 Support_Count L1
P1 6 <P1>
P2 4 <P2>
P3 4 <p3>
P4 4 <p4>
P5 4 <p5>
P6 4 <p6>
P7 6 <p7>

 

 

 

C2 Support_Count L2
<p1,p2> 3 <p1,p2>
<p1,p2> 3 <p1,p2>
<p1,p3> 3 <p1,p3>
<p1,p4> 3 <p1,p4>
<p1,p5> 2  
<p1,p6> 4 <p1,p6>
<p1,p7> 5 <p1,p7>
<p2,p1> 0  
<p2,p2> 2  
<p2,p3> 3 <p2,p3>
<p2,p4> 1  
<p2,p5> 1  
<p2,p6> 1  
<p2,p7> 3 <p2,p7>
<p3,p1> 0  
<p3,p2> 0  
<p3,p3> 0  
<p3,p4> 0  
<p3,p5> 0  
<p3,p6> 0  
<p3,p7> 0  
<p4,p1> 3 <p4,p1>
<p4,p2> 1  
<p4,p3> 1  
<p4,p4> 0  
<p4,p5> 1  
<p4,p6> 3 <p4,p6>
<p4,p7> 3 <p4,p7>
<p5,p1> 2  
<p5,p3> 2  
<p5,p4> 1  
<p5,p5> 0  
<p5,p6> 2  
<p5,p7> 3 <p5,p7>
<p6,p1> 1  
<p6,p2> 1  
<p6,p3> 1  
<p6,p4> 1  
<p6,p5> 0  
<p6,p6> 0  
<p6,p7> 2  
<p7,p1> 0  
<p7,p2> 0  
<p7,p3> 3 <p7,p3>
<p7,p4> 0  
<p7,p5> 0  
<p7,p6> 0  
<P7,p7> 0  

 

C3 Support_Count L3 C4  Support_Count L4
<p1,p1,p2> 0   <p1,p2,p7,p3> 3 <p1,p2,p7,p3>
<p1,p1,p3> 1   <p4,p1,p7,p3> 1  
<p1,p1,p4> 1  
<p1,p1,p6> 2  
<p1,p1,p7> 2  
<p1,p2,p3> 3 <p1,p2,p3>
<p1,p2,p7> 3 <p1,p2,p7>
<p1,p3,p7> 0  
<p1,p4,p1> 0  
<p1,p4,p6> 2  
<p1,p7,p3> 3 <p1,p7,p3>
<p2,p3,p7> 0  
<p2,p7,p3> 3 <p2,p7,p3>
<p4,p1,p1> 0  
<p4,p1,p2> 1  
<p4,p1,p3> 1  
<p4,p1,p4> 0  
<p4,p1,p6> 3 <p4,p1,p6>
<p4,p1,p7> 3 <p4,p1,p7>
<p3,p7,p3> 0  
<p5,p7,p3> 1          

 

So this sequence is frequent and also all of it's subsequence are freuent:

P1->P2->P7->P3

P1->P2

P1->P2->p7

P2->p7

P1->P3

P2->P3

P7->P3

P1->P2->P3

p2->P7-P3

 

abansal
Posts: 9
Posted 12:50 Jul 23, 2009 |

I disagree with arctica8' s solution. The solution he/she provided is application of GSP in general on all the supbsets which is not what problem is asking for. Problem asks for frequently traversed paths/page navigations. 

 

If you look at this solution it ives path like P1->P2->P7->P3... If you go through data there is only one instance S6 that actually follows this path ... other two instances like S1 have many other pages in between hence user doesn't directly travel from p2 to p7.

 

Some other patterns won't satisfy the criteria too....

xieguahu
Posts: 50
Posted 19:06 Jul 23, 2009 |

I some kind agree with abansa.  But I think when do the support_count, the sequence does not necessarily be adjacent to each other.

C1    Support_count   L1

P1            6                    <p1>

P2            4                    <p2>

P3            4                    <p3>

P4            4                    <p4>

P5            4                    <p5>

P6            4                    <p6>

P7            6                     <p7>

 

 

C2               Support_count     L2

<p1, p2>           3                       <p1, p2>

<p1, p4>           3                       <p1, p4>

<p1, p6>           4                       <p1, p6>

<p2, p4>           1

<p2, p5>           1

<p2, p6>           1

<p2, p7>           4                        <p2, p7>

<p3, p1>           0

<p4, p1>           3                        <p4, p1>

<p4, p5>           1

<p5, p1>           2

<p5, p3>           2

<p6, p1>           1

<p6, p7>          3                       <p6, p7>

<p7, p1>          0

<p7, p3>          3                       <p7, p3>

 

 

C3                                              Suppport_count       L3

<p1, p2, p7>                                   3                          <p1, p2, p7>

<p1, p4, p1>                                   2

<p1, p6, p7>                                   3                          <p1, p6, p7>

<p2, p7, p3>                                   3                          <p2, p7, p3>

<p4, p1, p2>                                   1

<p4, p1, p4>                                   0

<p4, p1, p6>                                   3                          <p4, p1, p6>

<p6, p7, p3>                                   1

 

C4                                                 Support_count      L4

<p1, p2, p7, p3>                             3                        <p1, p2, p7, p3>

<p4, p1, p6, p7>                             3                        <p1, p4, p6, p7>

 

Frequent Paths are:

p1-> p2

p1-> p4

p1-> p6

p2 -> p7

p4 -> p1

p6 -> p7

p7 -> p3

p1 -> p2 -> p7

p1 -> p6 -> p7

p2 -> p7 -> p3

p4 -> p1 -> p6

p1 -> p2 -> p7 -> p3

p1 -> p4 -> p6 -> p7

Last edited by xieguahu at 19:12 Jul 23, 2009.
PoonamPawar
Posts: 16
Posted 19:39 Jul 25, 2009 |
abansal wrote:

S1: <P1,P2,P5,P2,P7,P3>
S2: <P1,P4,P5,P1,P6,P7>
S3: <P1,P6,P1,P4>
S4: <P5,P4,P1,P6,P7>
S5: <P5,P3>
S6: <P1,P2,P7,P3>
S7: <P2,P7>
S8: <P1,P2,P4,P1,P2,P6,P7,P3>

min_sup=3

Applying GSP Theorem:

 

C1           Support       L1

-----------------------------------

P1               7            P1

P2               4            P2

P3               4            P3

P4               4            P4

P5               4            P5

P6               4            P6

P7               6            P7

 

Please note that even though some pages exist more than once in a session their support is counted as one

 

Normally in GSP we would calculate all the subsets. Since here only certain page navigation paths are valid. Only the valid paths should be considered as candidates

 

C2           Support       L2

-----------------------------------

P1P2            3            P1P2

P1P4            2

P1P6            3            P1P6

P2P4            1

P2P5            1

P2P6            1

P2P7            3            P2P7

P3P1            0

P4P1            2

P4P5            1

P5P1            1

P5P3            1

P6P1            1

P6P7            3            P6P7

P7P1            0

P7P3            3            P7P3

 

Please note that even the paths like P1P2 that occurs twice in session S8, support has only been counted once. Though it won't effect result for this data

 

 

L2                C3                   Support       L3

-------------------------------------------------------

P1P2            P1P2P7                1

P1P6            P1P6P7                3           P1P6P7

P2P7            P2P7P3                2

P6P7            P6P7P3                1

P7P3

 

Please notice that for normal GSP, we will pruine candidates before testing them for support. But in this case we do not pruine as its a path from one to other. For example in P1P6P7 candidate set, if we do pruining , it should be pruined as the subset P1P7 is not frequent but since P1P7 doesn't matter and only transactions  from p1 to p6 and further to p7 matters so pruining is skipped.

 

Hence Frequent Paths are:

 

P1--->P2

P1--->P6     

P2--->P7

P6--->P7

P7--->P3

P1--->P6--->P7

I think,Ankur is right.

&  P1 support_count  should be 6.

Applying GSP Theorem:

 

C1           Support       L1

-----------------------------------

P1               6            P1

P2               4            P2

P3               4            P3

P4               4            P4

P5               4            P5

P6               4            P6

P7               6            P7


 

PoonamPawar
Posts: 16
Posted 19:42 Jul 25, 2009 |
abansal wrote:

S1: <P1,P2,P5,P2,P7,P3>
S2: <P1,P4,P5,P1,P6,P7>
S3: <P1,P6,P1,P4>
S4: <P5,P4,P1,P6,P7>
S5: <P5,P3>
S6: <P1,P2,P7,P3>
S7: <P2,P7>
S8: <P1,P2,P4,P1,P2,P6,P7,P3>

min_sup=3

Applying GSP Theorem:

 

C1           Support       L1

-----------------------------------

P1               7            P1

P2               4            P2

P3               4            P3

P4               4            P4

P5               4            P5

P6               4            P6

P7               6            P7

 

Please note that even though some pages exist more than once in a session their support is counted as one

 

Normally in GSP we would calculate all the subsets. Since here only certain page navigation paths are valid. Only the valid paths should be considered as candidates

 

C2           Support       L2

-----------------------------------

P1P2            3            P1P2

P1P4            2

P1P6            3            P1P6

P2P4            1

P2P5            1

P2P6            1

P2P7            3            P2P7

P3P1            0

P4P1            2

P4P5            1

P5P1            1

P5P3            1

P6P1            1

P6P7            3            P6P7

P7P1            0

P7P3            3            P7P3

 

Please note that even the paths like P1P2 that occurs twice in session S8, support has only been counted once. Though it won't effect result for this data

 

 

L2                C3                   Support       L3

-------------------------------------------------------

P1P2            P1P2P7                1

P1P6            P1P6P7                3           P1P6P7

P2P7            P2P7P3                2

P6P7            P6P7P3                1

P7P3

 

Please notice that for normal GSP, we will pruine candidates before testing them for support. But in this case we do not pruine as its a path from one to other. For example in P1P6P7 candidate set, if we do pruining , it should be pruined as the subset P1P7 is not frequent but since P1P7 doesn't matter and only transactions  from p1 to p6 and further to p7 matters so pruining is skipped.

 

Hence Frequent Paths are:

 

P1--->P2

P1--->P6     

P2--->P7

P6--->P7

P7--->P3

P1--->P6--->P7

I think,Ankur is right.

But P1 support_count should be 6

Applying GSP Theorem:

 

C1           Support       L1

-----------------------------------

P1               6            P1

P2               4            P2

P3               4            P3

P4               4            P4

P5               4            P5

P6               4            P6

P7               6            P7

 

PoonamPawar
Posts: 16
Posted 22:00 Jul 25, 2009 |

 

I think, Ankur is right.

But support_count of P1 should be 6.

 

Applying GSP Theorem:

 

C1           Support       L1

-----------------------------------

P1               6            P1

P2               4            P2

P3               4            P3

P4               4            P4

P5               4            P5

P6               4            P6

P7               6            P7

HelloWorld
Posts: 88
Posted 23:33 Jul 25, 2009 |

Here's my version.. Check it out.. I'm wondering if you guys look at the power point slide about this algorithm?

It also listed the path to itself, so for example:

<P1>

<P2>

on the next run, it will also combine

<P1, P1>

<P1, P2>

<P2, P1>

<P2, P2>

then I'm assuming that in this algorithm, unlike apriori, the sequence matters.. and that's why it's called "The Generalized Sequential Patterns" algorithm..

Attachments:
cysun
Posts: 2935
Posted 14:29 Jul 26, 2009 |

None of the solutions so far seems to be completely correct.

HelloWorld is correct that sequences like <Pi,Pi> need to be included. There are some miscounts in HelloWorld's solution.

As for frequent patterns vs. actual navigation paths, my intention is to discover all frequent patterns, which are subsequences of the actual navigation histories, but note that the subsequence definition does not require the events to be adjacent as in the original sequence. Sorry for causing some confusion.

 

xieguahu
Posts: 50
Posted 17:00 Jul 26, 2009 |
I tried, but I think HelloWorld's caculation is correct

cysun wrote:

None of the solutions so far seems to be completely correct.

HelloWorld is correct that sequences like <Pi,Pi> need to be included. There are some miscounts in HelloWorld's solution.

As for frequent patterns vs. actual navigation paths, my intention is to discover all frequent patterns, which are subsequences of the actual navigation histories, but note that the subsequence definition does not require the events to be adjacent as in the original sequence. Sorry for causing some confusion.

 

 

alomo
Posts: 70
Posted 19:03 Jul 26, 2009 |

According to the topology of the website, some hyperlinks do not exist:

P5 -> P2

P5 -> P4

However, two sequences (S1 and S4) contain these impossible paths. Is that case, do we have to consider whole sequences (S1 and S4) or only the topologically possible part of them?

cysun
Posts: 2935
Posted 19:46 Jul 26, 2009 |
xieguahu wrote:
I tried, but I think HelloWorld's caculation is correct

cysun wrote:

None of the solutions so far seems to be completely correct.

HelloWorld is correct that sequences like <Pi,Pi> need to be included. There are some miscounts in HelloWorld's solution.

As for frequent patterns vs. actual navigation paths, my intention is to discover all frequent patterns, which are subsequences of the actual navigation histories, but note that the subsequence definition does not require the events to be adjacent as in the original sequence. Sorry for causing some confusion.

I think you are right. I must have miscounted. Sorry about that.

HelloWorld
Posts: 88
Posted 19:47 Jul 26, 2009 |
cysun wrote:
xieguahu wrote:
I tried, but I think HelloWorld's caculation is correct

cysun wrote:

None of the solutions so far seems to be completely correct.

HelloWorld is correct that sequences like <Pi,Pi> need to be included. There are some miscounts in HelloWorld's solution.

As for frequent patterns vs. actual navigation paths, my intention is to discover all frequent patterns, which are subsequences of the actual navigation histories, but note that the subsequence definition does not require the events to be adjacent as in the original sequence. Sorry for causing some confusion.

I think you are right. I must have miscounted. Sorry about that.

yay, i got the point right? :D

cysun
Posts: 2935
Posted 19:52 Jul 26, 2009 |
alomo wrote:

According to the topology of the website, some hyperlinks do not exist:

P5 -> P2

P5 -> P4

However, two sequences (S1 and S4) contain these impossible paths. Is that case, do we have to consider whole sequences (S1 and S4) or only the topologically possible part of them?

Those could be caused by a user clicking the back button of the browser, so they are not "impossible". Generally speaking, any browsing history is possible. For example, I can send you two URLs by email. If you click the URLs, the server will log two requests, and the two pages don't have to have hyperlinks between each other.

cysun
Posts: 2935
Posted 19:52 Jul 26, 2009 |
HelloWorld wrote:
cysun wrote:
xieguahu wrote:
I tried, but I think HelloWorld's caculation is correct

cysun wrote:

None of the solutions so far seems to be completely correct.

HelloWorld is correct that sequences like <Pi,Pi> need to be included. There are some miscounts in HelloWorld's solution.

As for frequent patterns vs. actual navigation paths, my intention is to discover all frequent patterns, which are subsequences of the actual navigation histories, but note that the subsequence definition does not require the events to be adjacent as in the original sequence. Sorry for causing some confusion.

I think you are right. I must have miscounted. Sorry about that.

yay, i got the point right? :D

You certainly do. Smile

HelloWorld
Posts: 88
Posted 19:53 Jul 26, 2009 |
cysun wrote:
HelloWorld wrote:
cysun wrote:
xieguahu wrote:
I tried, but I think HelloWorld's caculation is correct

cysun wrote:

None of the solutions so far seems to be completely correct.

HelloWorld is correct that sequences like <Pi,Pi> need to be included. There are some miscounts in HelloWorld's solution.

As for frequent patterns vs. actual navigation paths, my intention is to discover all frequent patterns, which are subsequences of the actual navigation histories, but note that the subsequence definition does not require the events to be adjacent as in the original sequence. Sorry for causing some confusion.

I think you are right. I must have miscounted. Sorry about that.

yay, i got the point right? :D

You certainly do. Smile

Awesome! Laughing