[x y] = [y x]. In my case I also need to remove numbers and garbage (like dates, 'Yes', No') . In other words all combinations like this : ['Bank of America' 'loan product'] ['loan product' 'Bank of America'] must meet only once not twice. Initial idea using hash map was not so efficient. I consider pair set as matrix [x y] where every element of this matrix is [x y]ij which was created after multiplication of two vectors x and y . In this square matrix all elements on top of diagonal are duplicates , they are equal to element bellow diagonal [x y]12 = [x y]21 because x,y from the same collection. (see example from this blog: How to visualise big CSV files . This example of one pair from this file: ['Bank of America' 'loan product'] ['loan product' 'Bank of America' ] ). To do not count element on top of diagonal of this matrix next condition will be used: [x y]ij where i <j . Total number of combination from the CSV file which I process will follow this formula:
which equal:
670760991810000 . Even after removal garbage it's still impossible to get calculation finished in finite time.
(ns clojure.examples.cartesianpairs (:gen-class)) (defn cartesian-pairs "Function read col and return list of all possible pairs [ [x1 x2] .. [xi xj] ] where xi not = xj and xi not number " [ coll ] (-> (for [x coll y coll :when (not= x y) :when (and (not (number? x)) ( not (number? y))) :when ( < (.indexOf coll x) (.indexOf coll y)) ] (str ":" x y ) ) ) ) (def my-coll [123 45 "b" "f" "d" "e" 'f 123 1234 4534] ) ;(for [l '(1 2 3)] (println "Hi") ) ;(swap! collection assoc (str ":" x ) 1) ) (->> my-coll cartesian-pairs println ) (println (reduce conj #{} (cartesian-pairs my-coll) )) (println (reduce #(assoc %1 %2 (inc (%1 %2 0))) {} (cartesian-pairs my-coll) ) ) ;(println @collection) ;(println (into {} #{(cartesian-pairs my-coll)} )) ; (println (get (into {} #{(cartesian-pairs my-coll)} ) ":[b f]") ) ;(println collection)The final Result won't have any duplication hash map of Cartesian pairs:
(:bf :bd :be :bf :fd :fe :ff :de :df :ef) #{:fe :bf :df :be :de :bd :ff :fd :ef} {:bf 2, :bd 1, :be 1, :fd 1, :fe 1, :ff 1, :de 1, :df 1, :ef 1}
No comments:
Post a Comment