This article provides example of reading data from Google BigQuery as pandas DataFrame.
Prerequisites
Refer to Pandas - Save DataFrame to BigQuery to understand the prerequisites to setup credential file and install pandas-gbq package. The permissions required for read from BigQuery is different from loading data into BigQuery; so please setup your service account permission accordingly.
The source table used in this tutorial looks like the screenshot below:
About to\_gbq function
In this tutorial, we directly use read_gbqfunction. Refer to pandas.read_gbq — pandas 1.2.3 documentation (pydata.org) for more details.
The signature of the function looks like the following:
pandas.read_gbq(query, project_id=None, index_col=None, col_order=None, reauth=False, auth_local_webserver=False, dialect=None, location=None, configuration=None, credentials=None, use_bqstorage_api=None, max_results=None, progress_bar_type=None)
Create a python script file
We start to create a python script file named pd-from-bq.pywith the following content:
import pandas as pd
from google.oauth2.service_account import Credentials
# Define source table in BQ
source_table = "YOUR_DATA_SET.pandas"
project_id = "YOUR_PROJECT_ID"
credential_file = "PATH_TO_YOUR_SERVICE_ACCOUNT_CREDENTIAL_FILE.json"
credential = Credentials.from_service_account_file(credential_file)
# Location for BQ job, it needs to match with destination table location
job_location = "australia-southeast1"
# Save Pandas dataframe to BQ
df = pd.read_gbq("select * from "+source_table, project_id=project_id,
location=job_location, credentials=credential)
print(df)
Please change the variables accordingly to reflect your environment and BigQuery setup.
Run the script
Run the script using the following command:
python ./pd-from-bq.py
The output looks like the following:
DATE TYPE SALES
0 2020-01-01 TypeA 1000
1 2020-01-01 TypeB 200
2 2020-01-01 TypeC 300
3 2020-02-01 TypeA 700
4 2020-02-01 TypeB 400
5 2020-02-01 TypeC 500
6 2020-03-01 TypeA 300
7 2020-03-01 TypeB 900
8 2020-03-01 TypeC 100
infoDue to pyarrow version issues in my Cloud Shell environment, I had to use Python 2.7 to run the script. Alternatively, you can also install an older version of pyarrow. For example: pip3 install pyarrow==1.0.0