-
-
Notifications
You must be signed in to change notification settings - Fork 355
DataFrameSuiteBase
DataFrameSuiteBase enables you to check if two DataFrames are equal. It also provides an easy way to get SparkContext and sqlContext. SparkContext and SqlContext are initialized before all testcases, So you can access them inside any test case.
For Java users the same functionality is supported by JavaDataFrameSuiteBase.
You can assert the DataFrames equality using method assertDataFrameEquals.
Additional Requirements
In early version of spark-testing-base the spark-hive dependency was marked as provided, so you may need to add the spark-hive package to your build if you are doing DataFrame tests (note that it can be included in just the test scope).
Example:
class test extends FunSuite with DataFrameSuiteBase {
test("simple test") {
val sqlCtx = sqlContext
import sqlCtx.implicits._
val input1 = sc.parallelize(List(1, 2, 3)).toDF
assertDataFrameEquals(input1, input1) // equal
val input2 = sc.parallelize(List(4, 5, 6)).toDF
intercept[org.scalatest.exceptions.TestFailedException] {
assertDataFrameEquals(input1, input2) // not equal
}
}
}When DataFrames contains doubles, you can compare them with acceptable tolerance for ex. (5 == 4.999). You can assert that the DataFrames approximately equal using method assertDataFrameApproximateEquals
Example:
class test extends FunSuite with DataFrameSuiteBase {
test("simple test") {
val sqlCtx = sqlContext
import sqlCtx.implicits._
val input1 = sc.parallelize(List[(Int, Double)]((1, 1.1), (2, 2.2), (3, 3.3))).toDF
val input2 = sc.parallelize(List[(Int, Double)]((1, 1.2), (2, 2.3), (3, 3.4))).toDF
assertDataFrameApproximateEquals(input1, input2, 0.11) // equal
intercept[org.scalatest.exceptions.TestFailedException] {
assertDataFrameApproximateEquals(input1, input2, 0.05) // not equal
}
}
}