Load 130.000 csv files to BigQuery. Limitation loading jobs

ivo
Bronze 1
Bronze 1

Hello, I have 130.000 csv files in GCS. I am uploading them via 

bigquery.Client() . But the problem is that apparently there is limitation job loading of 1.500 per table and per day. The code is uploading files one by one, so the BigQuery infer that it's a individual job. So, I can only load 1.500 files pers day. I have to finish this task very soon, Can Anyone help, I appreciate a lot !  🙂
1 1 67
1 REPLY 1

Hello,

Considering that you have reached the maximum threshold of 1,500 load jobs per table per day without any job failures, I suggest any of the following:

  1. Combine the load jobs to make them larger and less frequent. 
  2. Try using the BigQuery Storage Write API instead. As stated in the documentation, here are the advantages of using the Storage Write API:
    1. Exactly-once delivery semantics. The Storage Write API supports exactly-once semantics through the use of stream offsets. Unlike the tabledata.insertAll method, the Storage Write API never writes two messages that have the same offset within a stream, if the client provides stream offsets when appending records.
    2. Stream-level transactions. You can write data to a stream and commit the data as a single transaction. If the commit operation fails, you can safely retry the operation.
    3. Transactions across streams. Multiple workers can create their own streams to process data independently. When all the workers have finished, you can commit all of the streams as a transaction.
    4. Efficient protocol. The Storage Write API is more efficient than the older insertAll method because it uses gRPC streaming rather than REST over HTTP. The Storage Write API also supports binary formats in the form of protocol buffers, which are a more efficient wire format than JSON. Write requests are asynchronous with guaranteed ordering.
    5. Schema update detection. If the underlying table schema changes while the client is streaming, then the Storage Write API notifies the client. The client can decide whether to reconnect using the updated schema, or continue to write to the existing connection.
    6. Lower cost. The Storage Write API has a significantly lower cost than the older insertAll streaming API. In addition, you can ingest up to 2 TiB per month for free.

Hope this helps.