PySpark中遇到过的问题合集

PySpark中遇到过的问题合集

TypeError: ‘Builder’ object is not callable Spark structured streaming

参考PySpark时,quick-start中的一个demo运行失败,报错。

"""SimpleApp.py"""
from pyspark.sql import SparkSession

logFile = "YOUR_SPARK_HOME/README.md"  # Should be some file on your system
spark = SparkSession.builder().appName(appName).master(master).getOrCreate()
logData = spark.read.text(logFile).cache()

numAs = logData.filter(logData.value.contains('a')).count()
numBs = logData.filter(logData.value.contains('b')).count()

print("Lines with a: %i, lines with b: %i" % (numAs, numBs))

spark.stop()

报错内容:

Traceback (most recent call last):
  File "SimpleApp.py", line 5, in <module>
    spark = SparkSession.builder().appName(appName).master(master).getOrCreate()
TypeError: 'Builder' object is not callable

这个实际上是spark的一个bug,写法也做了调整。

参考:stackflow上的相同问题 spark上的相关jira

将写法改成下面这样即可

spark = SparkSession.builder.appName("simple app").getOrCreate()