当我试图从我的Databricks集群连接我的MySQL数据库时,我遇到了问题,我认为这是集群和数据库之间的连接问题,但当我试图用pymysql连接时,它可以工作,另一方面,如果我用Scala和JDBC做,连接会挂起。
这就是我使用PyMySQL进行连接的方法。
import pymysql
import pandas as pd
def queryData():
host = 'host'
port = 3306
mysql = {'host' : host ,
'user' : '',
'passwd': '',
'db' : 'dbo',
'port' : port}
connection = pymysql.connect(**mysql)
cursor = fibo.cursor()
cursor.execute('SELECT * FROM Job LIMIT 100')
res = pd.read_sql('SELECT * FROM Job LIMIT 100', connection )
return res
这是我试图用Scala来做的。
import org.apache.spark.sql.SparkSession
import java.util.Properties
object MySQLSample extends App{
def main() = {
val spark = SparkSession
.builder()
.appName("FiboConn")
.getOrCreate()
val url = "jdbc:mysql://host:3306"
val table = "dbo"
val properties = new Properties()
properties.put("user", "user")
properties.put("password", "pass")
Class.forName("com.mysql.cj.jdbc.Driver")
val mySQLDF = spark.read.jdbc(url, table, properties)
mySQLDF.show()