Skip to content

PHOENIX-7878 CDC perf improvement - skip redundant cell versions on data table scans#2493

Open
virajjasani wants to merge 1 commit into
apache:masterfrom
virajjasani:PHOENIX-7878-master
Open

PHOENIX-7878 CDC perf improvement - skip redundant cell versions on data table scans#2493
virajjasani wants to merge 1 commit into
apache:masterfrom
virajjasani:PHOENIX-7878-master

Conversation

@virajjasani
Copy link
Copy Markdown
Contributor

Jira: PHOENIX-7878

What changes were proposed in this pull request?

CDC perf improvement - skip redundant cell versions on data table scans

Why are the changes needed?

When a CDC query runs with pre, post, and/or change scopes, it scans the data table to reconstruct each change event (the change image plus the pre-image, and for the consumer path the full data-row state). Today that data table scan is a raw, all-versions scan, so for every data row we read back every version of every column - even though, for a given batch of changes, we only need two cells per column per change: the cell at the change timestamp, and the most recent cell just below it (the pre-image). On rows that are updated frequently this means we read, transfer, and process far more cells than the event reconstruction actually uses, which adds CPU, memory, and network overhead to CDC reads.

The purpose of this Jira is to add new CDCVersionFilter, in addition to SkipScanFilter on the data table scans. For each row it is given the set of change timestamps from the current batch and keeps only the cells that matter: the cell at each change timestamp, the first cell below each change timestamp (the pre-image), and all DeleteFamily markers (needed for deletion tracking), other cells are skipped to avoid redundant data transfer.

Does this PR introduce any user-facing change?

This is performance improvement

How was this patch tested?

UT and IT tests

Was this patch authored or co-authored using generative AI tooling?

Claude Opus 4.8

@virajjasani
Copy link
Copy Markdown
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant