Skip to content

Storage write api - support default stream#226

Merged
goccy merged 6 commits into
goccy:mainfrom
MatanLevy:main
Jun 12, 2024
Merged

Storage write api - support default stream#226
goccy merged 6 commits into
goccy:mainfrom
MatanLevy:main

Conversation

@MatanLevy

Copy link
Copy Markdown
Contributor

This pull request aims to support bigquery emulator for clients using storage write api with the default stream.

Two main requirements (based on google documentation) for default stream that are part of that pull request:

default stream creation

Additionally, every table has a special stream named ‘_default’ to which data can be written. This stream doesn’t need to be created using CreateWriteStream

commited stream in AppendRows

For COMMITTED streams (which includes the default stream), data is
visible immediately upon successful append.

@hankd-v

hankd-v commented Nov 9, 2023

Copy link
Copy Markdown

+1 to this - @goccy is this repo actively maintained still?

@HiddenSplioGuy

HiddenSplioGuy commented Nov 10, 2023

Copy link
Copy Markdown

+1 actively need this as well as the default streaming method uses the "_default" stream and you can't use connections pool without that : Trying to enable connection pool in non-default stream.

I tried a workaround for this consisting of creating the "_default" write stream after the table but It seems like the API doesn't allow us to set the name of the stream -> OUTPUT ONLY.

var streamName = WriteStreamName.of(bQProjectId, datasetId, tableId, "_default").toString();
CreateWriteStreamRequest request =
    CreateWriteStreamRequest.newBuilder()
        .setParent(TableName.of(bQProjectId, datasetId, tableId).toString())
        .setWriteStream(
            WriteStream.newBuilder()
                .setType(WriteStream.Type.COMMITTED)
                .setName(streamName)
                .build())
        .build();
WriteStream response = bigQueryWriteClient.createWriteStream(request);
System.out.println("WriteStream created " + response);

With the following code, the stream created have a randomly generated name.

@araj-dev

araj-dev commented Nov 20, 2023

Copy link
Copy Markdown

+1 i need this with go client.

Here's my example Go code using the managedwriter package to request the bigquery emulator:

	ctx := context.Background()
	grpcCtx, _ := context.WithTimeout(ctx, 1*time.Second)
	conn, _ := grpc.DialContext(grpcCtx, "localhost:9060",
		grpc.WithTransportCredentials(insecure.NewCredentials()),
		grpc.WithIdleTimeout(1*time.Second),
	)
	client, err := managedwriter.NewClient(ctx, projectID,
		option.WithGRPCConn(conn),
		option.WithoutAuthentication(),
	) // OK

	pb := &v1.pb{}
	descriptorProto, err := adapt.NormalizeDescriptor(pb.ProtoReflect().Descriptor())
	tableName := fmt.Sprintf("projects/%s/datasets/%s/tables/%s", projectID, datasetID, "{table_name}")
	stream, err := client.NewManagedStream(ctx,
		managedwriter.WithDestinationTable(tableName),
		managedwriter.WithSchemaDescriptor(descriptorProto),
		managedwriter.WithAppendRowsCallOption(gax.WithTimeout(1*time.Second)),
	) // return error
	
	// rpc error: code = Unknown desc = failed to find stream from projects/{project}/datasets/{dataset}/tables/{table}/streams/_default

When calling NewManagedStream() without specifying the stream type option, it internally invokes getWriteStream with the stream name argument projects/{project}/datasets/{dataset}/tables/{table}/streams/_default.

IMO, the API should automatically create a _default stream for each table if it doesn't exist at the time of the call, or alternatively, prepare a _default stream for all tables during the initialization phase.

Here's a quick fix I tried as an alternative, but I'm not sure if it's the right approach: araj-dev#1

@limmatfun

Copy link
Copy Markdown

+1 for default straem support

@goccy goccy left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add test cases for this fixing

@goccy goccy added the reviewed label Apr 6, 2024
@MatanLevy

Copy link
Copy Markdown
Contributor Author

@goccy added test case for default stream

@winterjung

Copy link
Copy Markdown

@goccy Could you please merge this PR into the main branch? I also need it.

@goccy

goccy commented Jun 12, 2024

Copy link
Copy Markdown
Owner

@MatanLevy I would like to check the CI results, so could you push an empty commit to trigger the CI ?

@MatanLevy

Copy link
Copy Markdown
Contributor Author

@goccy pushed now an empty commit

@goccy goccy self-requested a review June 12, 2024 04:09
@goccy

goccy commented Jun 12, 2024

Copy link
Copy Markdown
Owner

Thank you for your contribution !!! LGTM 👍

@goccy goccy merged commit 76384ba into goccy:main Jun 12, 2024
iolalla pushed a commit to iolalla/bigquery-emulator that referenced this pull request Aug 16, 2025
* storage write api support default stream

* commit in case of commited stream

* Storage write api - support default stream: add test cases

* empty commit

---------

Co-authored-by: Matan Levy <malevy@paloaltonetworks.com>
ohaibbq referenced this pull request in Recidiviz/bigquery-emulator Oct 29, 2025
* storage write api support default stream

* commit in case of commited stream

* Storage write api - support default stream: add test cases

* empty commit

---------

Co-authored-by: Matan Levy <malevy@paloaltonetworks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants