logo
down
shadow

Spark SQL - IN clause


Spark SQL - IN clause

Content Index :

Spark SQL - IN clause
Tag : scala , By : user90210
Date : November 28 2020, 11:01 PM

will be helpful for those in need I would like to add where condition for a column with Multiple values in DataFrame. , the method you are looking for is isin:
import sqlContext.implicits._
df.where($"type".isin("type1","type2") and $"status".isin("completed","inprogress"))
val types = Seq("type1","type2")
val statuses = Seq("completed","inprogress")

df.where($"type".isin(types:_*) and $"status".isin(statuses:_*))

Comments
No Comments Right Now !

Boards Message :
You Must Login Or Sign Up to Add Your Comments .

Share : facebook icon twitter icon

Using 'as' in the 'WHERE' clause in spark sql


Tag : development , By : jay
Date : March 29 2020, 07:55 AM
This might help you Is total a defined column in VIEW, or are you trying to filter on the value of count(*)? If you want to count and then filter on that count, the syntax should be something like:
select <fieldtogroupon>, count(*) as total
from VIEW
group by <fieldtogroupon>
having count(*) > 1

Which One is faster? Spark SQL with Where clause or Use of Filter in Dataframe after Spark SQL


Tag : hadoop , By : Salikh
Date : March 29 2020, 07:55 AM
Does that help Using explain method to see the physical plan is a good way to determine performance.
For example, the Zeppelin Tutorial notebook.
sqlContext.sql("select age, job from bank").filter("age = 30").explain
sqlContext.sql("select age, job from bank where age = 30").explain
== Physical Plan ==
Project [age#5,job#6]
+- Filter (age#5 = 30)
   +- Scan ExistingRDD[age#5,job#6,marital#7,education#8,balance#9]

Spark SQL between timestamp on where clause?


Tag : apache-spark , By : chad
Date : March 29 2020, 07:55 AM
Does that help As you are using Timestamp in your where clause, you need to convert LocalDateTime to Timestamp. Also note that the first parameter of between is lowerBound so in your case LocalDateTime.now().minusHours(1) should come before LocalDateTime.now(). And then you can do:
import java.time.LocalDateTime
import java.sql.Timestamp

df.where(
     unix_timestamp($"date", "yyyy-MM-dd HH:mm:ss.S")
       .cast("timestamp")
       .between(
          Timestamp.valueOf(LocalDateTime.now().minusHours(1)),
          Timestamp.valueOf(LocalDateTime.now())
       ))
  .show()
+-----+--------------------+
|color|                date|
+-----+--------------------+
|  red|2016-11-29 10:58:...|
+-----+--------------------+

Does Spark Supports With Clause?


Tag : hadoop , By : Hans-Inge
Date : March 29 2020, 07:55 AM
Does that help The WITH statement is not the problem, but rather the INSERT INTO statement that's causing trouble.
Here's a working example that uses the .insertInto() style instead of the "INSERT INTO" SQL:
val s = Seq((1,"foo"), (2, "bar"))
s: Seq[(Int, String)] = List((1,foo), (2,bar))
val df = s.toDF("id", "name")
df.registerTempTable("df")
sql("CREATE TABLE edf_final (id int, name string)")
val e = sql("WITH edf AS (SELECT id+1, name FROM df cook) SELECT * FROM edf")
e.insertInto("edf_final")

How to use UDF in where clause in Scala Spark


Tag : scala , By : cthulhup
Date : March 29 2020, 07:55 AM
I hope this helps you . I'm trying to check if 2 Double columns are equal in a Dataframe to a certain degree of precision, so 49.999999 should equal 50. Is it possible to create a UDF and use it in a where clause? I am using Spark 2.0 in Scala. , You can use udf but there is no need for that:
import org.apache.spark.sql.functions._

val precision: Double = ???

df.where(abs($"col1" - $"col2") < precision)
df.where(yourUdf($"col1", $"col2"))
Related Posts Related QUESTIONS :
  • How to validate Date Column of dateframe
  • Tail recursion and call by name / value
  • Why there is a different error message on using += and a=x+y on a val variable?
  • Histogram for RDD in Scala?
  • What is value of '_.nextInt' in this expression in Scala
  • Modify keys in Spray JSON
  • base64 decoding of a dataframe
  • Identifying object fields with strings in Scala
  • ScalaTest can't verify mock function invocations inside Future
  • Is it safe to catch an Exception object
  • what is the optimal way to show differences between two data sets
  • case class inheriting another class/trait
  • Scala naming convention for Futures
  • Fs2 Stream.Compiler is not found (could not find implicit value Compiler[[x]F[x],G])
  • Can you mock a value rather than a method?
  • PureConfig ConfigLoader in Scala
  • How to extract latest/recent partition from the list of year month day partition columns
  • How can I partition a RDD respecting order?
  • How beneficial is Parallel Seq for executing sequence of statements?
  • How to obtain class type from case class value
  • How do I append an element to a list in Scala
  • scala - mock function and replace implementation in tests
  • Scala: no-name parameters in function with List and Option
  • How to add the schema in a dataframe from a config file
  • Decreasing the compilation time when using shapeless HList
  • Sorting List of values in a RDD in Scala
  • How to manage the hierarchy of State in Functional Programming?
  • can we declare variables and use them in for loop in scala
  • Bind columns of 2 different dataframes spark
  • Sum of an array elements through while loop
  • Scala: Using Trait + Companion object to enumerate implementations
  • Which tools can I use to benchmark a scala code?
  • Converting Case Classes with params as Case Classes to Avro Message to send to Kafka
  • Stop the fs2-stream after a timeout
  • Is there a way New->Scala Class in intellij can default to creating a case class (vs a regular class)?
  • scala filter list of object if some fields in the object are same
  • How to Sorting results after run code in Spark
  • reduce RDD having key as (String,String)
  • Accessibility of scala constructor parameters
  • list Pattern Matching to return a new list of every other element
  • How can I modify this Ordering?
  • Lagom: is number of read side shards set per node or within the whole cluster?
  • How to create multiple dataframe using same case class
  • Scala mockito: Delay Mockito.when().thenReturn(someFuture)
  • throw exception does not work inside future.map scala?
  • compare case class fields with sub fields of another case class in scala
  • How can I get random data generated for scala case classes with the ability to "change some values" for unit t
  • How to change "hostname" while running sbt run?
  • Wrapping string interpolator macro in another
  • Understanding the diagrams of Product and Coproduct
  • How would you pop an element from the end of a list in scala, and add it to the end of another list?
  • Load RDD from name
  • Assign same value to multiple variables in Scala
  • what is this scala grammar
  • Extend case class from another case class
  • No pattern match for Seq[Seq[T]]
  • SBT cannot find hadoop-aws 3.1.1
  • Scala .init method example
  • Immutability and shared references - how to reconcile?
  • In Scala 2.8 collections, why was the Traversable type added above Iterable?
  • shadow
    Privacy Policy - Terms - Contact Us © scrbit.com