pittgoogle.bigquery
Classes to facilitate connections to BigQuery datasets and tables.
|
A client for interacting with Google BigQuery. |
|
Methods and properties for interacting with a Google BigQuery table. |
- class pittgoogle.bigquery.Client(auth: Auth = None, client: Client | None = None)[source]
A client for interacting with Google BigQuery.
- property auth: Auth
Credentials for the Google Cloud project that this client will be connected to.
This will be created using environment variables if necessary.
- property client: Client
Google Cloud BigQuery client.
If the client has not been initialized yet, it will be created using
Client.auth
.- Returns:
An instance of the Google Cloud BigQuery client.
- Return type:
google.cloud.bigquery.Client
- list_table_names(dataset: str, projectid: str | None = None) list[str] [source]
Get the names of the tables in the dataset.
- Parameters:
dataset (str) – The name of the dataset.
projectid (str, optional) – The dataset owner’s Google Cloud project ID. If None,
Client.client.project
will be used.
- Returns:
A list of table names in the dataset.
- Return type:
list[str]
Example
bqclient = pittgoogle.bigquery.Client() bqclient.list_table_names(dataset="ztf", projectid=pittgoogle.ProjectIds().pittgoogle)
- query(query: str, to_dataframe: bool = True, to_dataframe_kwargs: dict | None = None, **job_config_kwargs)[source]
Submit a BigQuery query job.
- Parameters:
query (str) – The SQL query to execute.
to_dataframe (bool, optional) – Whether to fetch the results and return them as a pandas DataFrame (True, default) or just return the query job (False).
to_dataframe_kwargs (dict, optional) – Keyword arguments to be passed to
google.cloud.bigquery.QueryJob.to_dataframe
. Notable options:dtypes
(dict),max_results
(int),create_bqstorage_client
(bool). This is ignored unlessto_dataframe
is True.create_bqstorage_client
controls whether to use google.cloud.bigquery_storage (True) or google.cloud.bigquery (False). bigquery_storage can be faster but is not necessary. If you do not specify this parameter, pittgoogle will set it to True if the bigquery_storage library is installed, else False.**job_config_kwargs – Keyword arguments to pass to the google.cloud.bigquery.QueryJobConfig constructor. Notable option:
dry_run
(bool).
- Returns:
pandas.DataFrame if
to_dataframe
is True, else google.cloud.bigquery.QueryJob
Example
Query two tables (ztf.alerts_v4_02 and ztf.alerts_v3_3) for data on one object (ZTF19acfixfe).
bqclient = pittgoogle.bigquery.Client() pittgoogle_project = pittgoogle.ProjectIds().pittgoogle sql = f""" SELECT objectId, candid, candidate.jd, candidate.fid, candidate.magpsf FROM `{pittgoogle_project}.ztf.alerts_v3_3` WHERE objectId = 'ZTF19acfixfe' UNION ALL SELECT objectId, candid, candidate.jd, candidate.fid, candidate.magpsf FROM `{pittgoogle_project}.ztf.alerts_v4_02` WHERE objectId = 'ZTF19acfixfe' """ diaobject_df = bqclient.query(query=sql)
- class pittgoogle.bigquery.Table(name: str, dataset: str, client: Client | None = NOTHING, projectid: str = None)[source]
Methods and properties for interacting with a Google BigQuery table.
- Parameters:
name (str) – Name of the BigQuery table.
dataset (str) – Name of the BigQuery dataset this table belongs to.
projectid (str, optional) – The table owner’s Google Cloud project ID. If not provided, the client’s project ID will be used.
client (google.cloud.bigquery.Client, optional) – BigQuery client that will be used to access the table. If not provided, a new client will be created the first time it is requested.
- dataset: str
Name of the BigQuery dataset this table belongs to.
- classmethod from_cloud(name: str, *, dataset: str | None = None, survey: str | None = None, testid: str | None = None)[source]
Create a
Table
object using a BigQuery client with implicit credentials.Use this method when creating a
Table
object in code running in Google Cloud (e.g., in a Cloud Run module). The underlying Google APIs will automatically find your credentials.The table resource in Google BigQuery is expected to already exist.
- Parameters:
name (str) – Name of the table.
dataset (str, optional) – Name of the dataset containing the table. Either this or a survey is required. If a testid is provided, it will be appended to this name following the Pitt-Google naming syntax.
survey (str, optional) – Name of the survey. This will be used as the name of the dataset if the dataset kwarg is not provided. This kwarg is provided for convenience in cases where the Pitt-Google naming syntax is used to name resources.
testid (str, optional) – Pipeline identifier. If this is not None, False, or “False”, it will be appended to the dataset name. This is used in cases where the Pitt-Google naming syntax is used to name resources. This allows pipeline modules to find the correct resources without interfering with other pipelines that may have deployed resources with the same base names (e.g., for development and testing purposes).
- Returns:
The Table object.
- Return type:
- property id: str
Fully qualified table ID with syntax ‘projectid.dataset.name’.
- insert_rows(rows: list[dict | Alert]) list[dict] [source]
Insert rows into the BigQuery table.
- Parameters:
rows (list[dict or Alert]) – The rows to be inserted. Can be a list of dictionaries or a list of Alert objects.
- Returns:
A list of errors encountered.
- Return type:
list[dict]
- name: str
Name of the BigQuery table.
- property projectid: str
The table owner’s Google Cloud project ID.
Defaults to
Table.client.client.project
.
- query(*, columns: list[str] | None = None, where: str | None = None, limit: int | str | None = None, to_dataframe: bool = True, dry_run: bool = False, return_sql: bool = False)[source]
Submit a BigQuery query job. Against this table.
This method supports basic queries against this table. For more complex queries or queries against multiple tables, use
Client.query
.- Parameters:
columns (list[str], optional) – List of columns to select. If None, all columns are selected.
where (str, optional) – SQL WHERE clause.
limit (int or str, optional) – Maximum number of rows to return.
to_dataframe (bool, optional) – Whether to fetch the results and return them as a pandas DataFrame (True, default) or just return the query job (False).
dry_run (bool, optional) – Whether to do a dry-run only to check whether the query is valid and estimate costs.
return_sql (bool, optional) – If True, the SQL query string will be returned. The query job will not be submitted.
- Returns:
The SQL query string if
return_sql
is True. Otherwise, the results in a DataFrame ifto_dataframe
is True, else the query job.- Return type:
pandas.DataFrame, google.cloud.bigquery.QueryJob, or str
Example
alerts_tbl = pittgoogle.Table( name="alerts_v4_02", dataset="ztf", projectid=pittgoogle.ProjectIds().pittgoogle ) columns = ["objectId", "candid", "candidate.jd", "candidate.fid", "candidate.magpsf"] where = "objectId IN ('ZTF18aarunfu', 'ZTF24aavyicb', 'ZTF24aavzkuf')" diaobjects_df = alerts_tbl.query(columns=columns, where=where)
- property schema: pd.DataFrame
Schema of the BigQuery table.
- property table: Table
Google Cloud BigQuery Table object.
Makes a get_table request if necessary.
- Returns:
The BigQuery Table object, connected to the Cloud resource.
- Return type:
google.cloud.bigquery.Table