Posts

Implement recursive query in Hive using pyspark

 I wanted to traverse through data in table which is in decision tree structure which parent child relationship . It it very easy to implement in oracle using CONNECT BY but we don't have this function in hive . So i implemented this in hive using hive and pyspark . Below is code snippet which i used to implement this. Explanation follows   i=1         df=spark.sql("select perent_id,child_id from table_for_recursion where perent_id is null ")         df.createOrReplaceTempView("df")                 spark.sql("insert overwrite table task_recursion_source select * from df")                  spark.sql("insert overwrite table task_recursion_result select * from df")          while i > 0:           df1=spark.sql("select qar.perent_id,qar.child_id from table_for_recursion qar join task_recursion_so...