profile
viewpoint

Ask questionsBigQuery: Create new Table using schema from json file

I have an App Engine Standard Maven project, where a procedure need to create a Table on BigQuery.

<dependency>
    <groupId>com.google.cloud</groupId>
    <artifactId>google-cloud-bigquery</artifactId>
    <version>0.32.0-beta</version>
</dependency>

I'm using the sample code provided here https://github.com/GoogleCloudPlatform/google-cloud-java/tree/master/google-cloud-bigquery#creating-a-table

TableId tableId = TableId.of(datasetId, "my_table_id");
// Table field definition
Field stringField = Field.of("StringField", LegacySQLTypeName.STRING);
// Table schema definition
Schema schema = Schema.of(stringField);
// Create a table
StandardTableDefinition tableDefinition = StandardTableDefinition.of(schema);
Table createdTable = bigquery.create(TableInfo.of(tableId, tableDefinition));

The difference with my need, compared to this example, is that I have a json file which contains the schema. Here is an example:

[
	{
		"mode": "REQUIRED",
		"name": "identifier",
		"type": "STRING"
	},
	{
		"mode": "REQUIRED",
		"name": "code",
		"type": "STRING"
	},
	{
		"mode": "REQUIRED",
		"name": "description",
		"type": "STRING"
	}
]

I'm unable to find an existing method which load the table schema from a json file, instead of creating it manually from Schema/FieldList/Field classes. Something like

Schema schema = Schema.parseJson(jsonSchema);

Is there way to load the json file or do I need to build a custom parser?

While I'm waiting for a reply, I wrote a custom deserializer based on Gson library. It is working, but if there is an already built-in method, I'll be more than happy to use it

public static void main(String[] args) {
    // TODO Load schema from file
    String jsonSchema = "[{\"mode\":\"REQUIRED\",\"name\":\"identifier\",\"type\":\"STRING\"},{\"mode\":\"REQUIRED\",\"name\":\"code\",\"type\":\"STRING\"},{\"mode\":\"REQUIRED\",\"name\":\"description\",\"type\":\"STRING\"}]";

    // Json schema uses "fields"
    // com.google.cloud.bigquery.Field uses "subFields"
    // FIXME Unable to use @SerializedName policy
    jsonSchema = jsonSchema.replace("\"fields\"", "\"subFields\"");

    // Deserialize schema with custom Gson
    Field[] fields = getGson().fromJson(jsonSchema, Field[].class);
    Schema schema = Schema.of(fields);

    System.out.println(schema.toString());
}

public static Gson getGson() {
    JsonDeserializer<LegacySQLTypeName> typeDeserializer = (jsonElement, type, deserializationContext) -> {
        return LegacySQLTypeName.valueOf(jsonElement.getAsString());
    };

    JsonDeserializer<FieldList> subFieldsDeserializer = (jsonElement, type, deserializationContext) -> {
        Field[] fields = deserializationContext.deserialize(jsonElement.getAsJsonArray(), Field[].class);
        return FieldList.of(fields);
    };

    return new GsonBuilder()
        .registerTypeAdapter(LegacySQLTypeName.class, typeDeserializer)
        .registerTypeAdapter(FieldList.class, subFieldsDeserializer)
        .create();
}
googleapis/google-cloud-java

Answer questions marengaz

here's a scala implementation using the bq dto object (so it works with nested fields too). im sure you can adapt it to something similar in java:

json -> schema

import com.google.api.client.json.jackson2.JacksonFactory
import com.google.api.services.bigquery.model.{TableFieldSchema, TableSchema}
import com.google.cloud.bigquery.Schema

object BqJsonToBqSchema {
  private val parser = new JacksonFactory()

  private def dtoTableSchemaToBqSchema(dtoSchema: TableSchema): Schema = {
    val fromPbMethod =
      classOf[Schema]
        .getDeclaredMethods
        .toIterable
        .find(method => method.getName == "fromPb")
        .get

    fromPbMethod.setAccessible(true)
    fromPbMethod.invoke(null, dtoSchema).asInstanceOf[Schema]
  }

  def convert(jsonSchemaString: String): Schema = {
    val fieldsListDto =
      parser
        .createJsonParser(jsonSchemaString)
        .parseArray(classOf[java.util.ArrayList[TableFieldSchema]], classOf[TableFieldSchema])
        .asInstanceOf[java.util.ArrayList[TableFieldSchema]]

    val schemaDto =  new TableSchema()
    schemaDto.setFields(fieldsListDto)

    dtoTableSchemaToBqSchema(schemaDto)
  }
}

schema -> json

import com.google.cloud.bigquery.{Schema => BqSchema}
import com.google.api.services.bigquery.model.{TableSchema => BqTableSchema}
import org.codehaus.jackson.map.ObjectMapper

object BqSchemaToBqJson {

  private def bqSchemaToJsonString(schema: BqSchema): String = {
    val toPbMethod =
      classOf[BqSchema]
        .getDeclaredMethods
        .toIterable
        .find(method => method.getName == "toPb")
        .get
    toPbMethod.setAccessible(true)
    val tableSchema = toPbMethod.invoke(schema).asInstanceOf[BqTableSchema]

    val objMapper = new ObjectMapper()
    objMapper.writeValueAsString(tableSchema.getFields)
  }

  def convert(bqSchema: BqSchema): String = {
    bqSchemaToJsonString(bqSchema)
  }

}
useful!

Related questions

Dialogflow V2, Springboot Dialogflow Library calls are failing ::: java.lang.IllegalArgumentException: Failed to load any of the given libraries: [netty_tcnative_linux_x86_64, netty_tcnative_linux_x86_64_fedora, netty_tcnative_x86_64, netty_tcnative] hot 2
PubSub: grpc Uncaught exception in the SynchronizationContext. Panic! - Could not find LoadBalancer pick_first hot 2
Pubsub Subscriber initialization crashes JVM on Alpine Linux with libc6-compat hot 1
Google-cloud-language: No such Method Error on gax grpc hot 1
Duplicate classes error when adding dialogflow dependency 0.98.0-alpha along with firebase hot 1
The newest version of google-cloud-speech jar bug report hot 1
Firestore: Quickstart throws IllegalStateException: Could not find policy 'pick_first' hot 1
Firestore: should library handle/retry UNAVAILABLE errors? hot 1
PubSub: grpc Uncaught exception in the SynchronizationContext. Panic! - Could not find LoadBalancer pick_first hot 1
java.lang.NoSuchMethodError on Google PubSub Publisher.newBuilder().build() hot 1
Error reporting from GCE fails due to auth issues hot 1
Dependency conflict with KMS when running Spark code in Dataproc hot 1
Memory leak from netty/Deadline exceeded after 9999961477ns hot 1
Unable to make GRPC call in 32bit Windows Machine hot 1
PubSub: grpc Uncaught exception in the SynchronizationContext. Panic! - Could not find LoadBalancer pick_first hot 1
Github User Rank List