[x y] = [y x]. In my case I also need to remove numbers and garbage (like dates, 'Yes', No') . In other words all combinations like this : ['Bank of America' 'loan product'] ['loan product' 'Bank of America'] must meet only once not twice. Initial idea using hash map was not so efficient. I consider pair set as matrix [x y] where every element of this matrix is [x y]ij which was created after multiplication of two vectors x and y . In this square matrix all elements on top of diagonal are duplicates , they are equal to element bellow diagonal [x y]12 = [x y]21 because x,y from the same collection. (see example from this blog: How to visualise big CSV files . This example of one pair from this file: ['Bank of America' 'loan product'] ['loan product' 'Bank of America' ] ). To do not count element on top of diagonal of this matrix next condition will be used: [x y]ij where i <j . Total number of combination from the CSV file which I process will follow this formula:
which equal:
670760991810000 . Even after removal garbage it's still impossible to get calculation finished in finite time.
(ns clojure.examples.cartesianpairs
(:gen-class))
(defn cartesian-pairs
"Function read col and return list of all possible pairs [ [x1 x2] .. [xi xj] ] where xi not = xj and xi not number "
[ coll ]
(->
(for [x coll y coll :when (not= x y) :when (and (not (number? x)) ( not (number? y)))
:when ( < (.indexOf coll x) (.indexOf coll y)) ]
(str ":" x y ) )
)
)
(def my-coll [123 45 "b" "f" "d" "e" 'f 123 1234 4534] )
;(for [l '(1 2 3)] (println "Hi") ) ;(swap! collection assoc (str ":" x ) 1) )
(->>
my-coll
cartesian-pairs
println
)
(println (reduce conj #{} (cartesian-pairs my-coll) ))
(println
(reduce #(assoc %1 %2 (inc (%1 %2 0)))
{}
(cartesian-pairs my-coll) )
)
;(println @collection)
;(println (into {} #{(cartesian-pairs my-coll)} ))
; (println (get (into {} #{(cartesian-pairs my-coll)} ) ":[b f]") ) ;(println collection)
The final Result won't have any duplication hash map of Cartesian pairs:
(:bf :bd :be :bf :fd :fe :ff :de :df :ef)
#{:fe :bf :df :be :de :bd :ff :fd :ef}
{:bf 2, :bd 1, :be 1, :fd 1, :fe 1, :ff 1, :de 1, :df 1, :ef 1}


No comments:
Post a Comment