Dynamic Unpivot

  • I have a table I didn't design that has tons of repeating groups in it... here's an abbreviated version:

    USE tempdb;
    go
    CREATE TABLE [dbo].[Toxicity2](
    [RecordID] [int] IDENTITY(1,1) NOT NULL,
    [PatientID] [int] NOT NULL,
    [Cycle] [tinyint] NOT NULL,
    [ANEMIA] [tinyint] NULL,
    [Causality1] [tinyint] NULL,
    [Relatedness1] [tinyint] NULL,
    [ALOPECIA] [tinyint] NULL,
    [Causality2] [tinyint] NULL,
    [Relatedness2] [tinyint] NULL,
    [ANOREXIA] [tinyint] NULL,
    [Causality3] [tinyint] NULL,
    [Relatedness3] [tinyint] NULL
    ) ON [PRIMARY]
    GO

    /* add some dummy data */
    INSERT INTO Toxicity2 (
    PatientID, Cycle,
    ANEMIA, Causality1, Relatedness1,
    ALOPECIA, Causality2, Relatedness2,
    ANOREXIA, Causality3, Relatedness3
    ) VALUES
    (10001, 1,
    null, null, null,
    2, 1, 1,
    3, 2, 1),
    (10002, 1,
    2,1,1,
    null, null, null,
    2, 1, 2);

    I can use CROSS APPLY (VALUES...) to unpivot it, like this:

    /* this hard codes the column names... I wanted to make them dynamic */

    SELECT
    PatientID,
    Cycle,
    ca.Symptom,
    ca.Grade,
    ca.Causality,
    ca.Relatedness
    FROM Toxicity2
    CROSS APPLY (
    VALUES
    /* how do I write dynamic SQL to do this part?
    I can identify the groupings by finding all the
    columns LIKE 'Causality%' and then use COL_NAME() etc
    to get the column names */
    (ANEMIA,'Anemia',Causality1, Relatedness1),
    (ALOPECIA, 'Alopecia', Causality2, Relatedness2),
    (ANOREXIA, 'Anorexia', Causality3, Relatedness3)
    ) ca (Grade, Symptom, Causality, Relatedness)
    WHERE ca.Grade IS NOT NULL;

    The obvious pattern is to find all the Causality% columns, and then basically append them all so I get one SQL string (I know the PatientID and Cycle stuff has to be prepended to this, but this is where my code is doing something stupid. For some reason, my looping logic appears to be wrong because I only get one set of values (the Anemia row), when I should get all 3.

    /*
    The idea is to build a list of the values to cross apply
    */
    DECLARE @DynSQL NVARCHAR(MAX);
    DECLARE @RowValue NVARCHAR(200);

    DECLARE field_list_cursor
    CURSOR LOCAL FAST_FORWARD
    FOR

    SELECT RowValue = '(' + QUOTENAME(tc.[name]) + ', ' +
    '''' + tc.Name + '''' + ', ' + tc.Cn + ', '
    + tc.Rn + ')'
    FROM
    (
    SELECT[name],
    column_id,
    /* read the next two row values Causality[n] and Relatedness[n] */
    LEAD([name],1) OVER (ORDER BY ac.column_id) AS Cn,
    LEAD([name],2) OVER (ORDER BY ac.column_id) AS Rn
    FROM sys.all_columns ac
    WHERE ac.object_id = OBJECT_ID('Toxicity')
    ) tc
    WHERE tc.Cn LIKE 'Causality%';

    OPEN field_list_cursor;

    -- perform the first fetch
    FETCH NEXT FROM field_list_cursor INTO @RowValue;

    PRINT 'RowValue: ' + @RowValue;
    -- check @@fetch_status to see if there are any more rows to fetch

    WHILE @@FETCH_STATUS = 0
    BEGIN
    FETCH NEXT FROM field_list_cursor INTO @RowValue;
    SET @DynSQL += ', ' + @RowValue;
    PRINT @DynSQL;
    END

    CLOSE field_list_cursor;
    DEALLOCATE field_list_cursor;

    (maybe I'll see the mistake I'm making in the morning... right now I can't see it, but it's only returning the first record.) What am I doing wrong?

    If it matters, I'm basically building the CROSS APPLY section in the query above so that it's dynamic. I'm just doing something dumb. =(

    Thanks for any pointers. Don't waste your time rewriting a lot of stuff, just point out my mistake.

    Pieter

  • Hi Pieter,

    Once you unpivot into (Symptom, Grade, Causality, Relatedness) you’ve gotten rid of the original “slot” number (the 1, 2, 3 suffixes). Maybe that matters? It might if you later need to order by the original slot (in the grouping) or to pivot back to the original columns. Before you aggregate the rows, you could carry the suffix forward as a column when you unpivot

    CROSS APPLY (VALUES
    (1, ANEMIA , 'Anemia' , Causality1, Relatedness1),
    (2, ALOPECIA, 'Alopecia', Causality2, Relatedness2),
    (3, ANOREXIA, 'Anorexia', Causality3, Relatedness3)
    ) ca (Seq, Grade, Symptom, Causality, Relatedness)

    When you convert the query to dynamic SQL you could generate the 'Seq' column values from metadata (or a series generator). However you want to come up with the symptoms list is up to you.

    Then you could use STRING_AGG to combine the row values

    SELECT t.PatientID, t.Cycle,
    STRING_AGG(CONVERT(varchar(10), ca.Grade), ',') WITHIN GROUP (ORDER BY ca.Seq) AS GradesInSlotOrder
    FROM dbo.Toxicity2 AS t
    CROSS APPLY (VALUES
    (1, ANEMIA , 'Anemia' , Causality1, Relatedness1),
    (2, ALOPECIA, 'Alopecia', Causality2, Relatedness2),
    (3, ANOREXIA, 'Anorexia', Causality3, Relatedness3)
    ) ca (Seq, Grade, Symptom, Causality, Relatedness)
    WHERE ca.Grade IS NOT NULL
    GROUP BY t.PatientID, t.Cycle;

    Right now the WHERE clause filters out NULL values for ca.Grade. If you wanted to preserve a slot in 'GradesInSlotOrder' for NULL values you could add COALESCE to the expression and remove the WHERE clause so the 'GradesInSlotOrder' strings would contain doubled commas for missing slots (e.g. 2,,3)

    SELECT t.PatientID, t.Cycle,
    STRING_AGG(COALESCE(CONVERT(varchar(10), ca.Grade), ''), ',') WITHIN GROUP (ORDER BY ca.Seq) AS GradesInSlotOrder
    FROM dbo.Toxicity2 AS t
    CROSS APPLY (VALUES
    (1, ANEMIA , 'Anemia' , Causality1, Relatedness1),
    (2, ALOPECIA, 'Alopecia', Causality2, Relatedness2),
    (3, ANOREXIA, 'Anorexia', Causality3, Relatedness3)
    ) ca (Seq, Grade, Symptom, Causality, Relatedness)
    GROUP BY t.PatientID, t.Cycle;

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • Oh, I STRING_AGG WITHIN GROUP was a good catch. Thanks for that!

    The bigger problem is that I have lots of tables with the same hideous design. That's what I was trying to sort out by finding the Causality columns and looking "left and right" to build the list dynamically.

    The way I thought it should work was to read the columns from sys.all_columns and then retrieve the sets of columns and then append that to the 'CROSS APPLY (VALUES (' + @TheList + ')'  ... kind of thing. Do I have to use a cursor for that? (Well, at least I'm looping over maybe 80 columns and not millions of records!)

  • The design is annoying but maybe not unfixable if you could encapsulate access (queries, views, etc.). Imo the bigger issue is whether the table(s) store when things happened (datetime per symptom/cycle/etc.). If that’s missing just unpivoting the columns into (a) normalized table(s) wouldn't "fix" much. Also, since you end up combining all rows from different tables it could make sense to include the table name in the unpivot output so rows are traceable back to source later.

    For the dynamic part define the pattern clearly:

    - Is it always Causality + digits only (Causality1, Causality2, ...)? Or could it be Causality_1, Causality01, etc.? The query should strictly enforce the pattern with a typed check (PATINDEX [0-9][0-9]etc. on the suffix(es)) so it doesn't accidentally pick up columns like CausalityInsurance, CausalityDetail, etc.

    - Is a slot only valid if all 3 columns exist (Grade, CausalityN, RelatednessN)?

    - Is the Grade column always immediately to the left of CausalityN? Are there other positional or offset rules?

    To find the list of tables that contain CausalityN-style columns a loop/cursor is not necessary. To query for the columns for each of multiple tables a loop is needed. You cannot pass a set of table names into a single FROM clause which is what's necessary here. Imo the only necessary loop is over tables and not columns. Maybe the query could use a WHILE loop over tables to collect all of the needed columns

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • The basic pattern is SymptomName, Causality#, Relatedness#

    It's oncology stuff, so there are no other column names like that. Causality and Relatedness all have numeric suffixes from 1 to however many they need. The stupid thing is that every drug protocol (study database) has only one "Toxicity" table like this, and they have varying numbers of these sets, and the symptom names aren't constant.

    I was thinking of basically denying access to the base table and force the users to suffer through a table valued function to return a normalized view of the data. (either that or append the stuff to a final table, but then you have to go looking for updates which is a hassle).

    The grade part is weird. The value for the grade is stored in the column in all caps (that's the symptom). That's why the weird 'ColumnName' AS Grade is there.

    The way I was thinking of solving it (ha ha ha) was to make the chunk inside the VALUES () section the result of a function or that returned the sets of columns I needed....

    SELECT t.PatientID, t.Cycle,
    STRING_AGG(COALESCE(CONVERT(varchar(10), ca.Grade), ''), ',') WITHIN GROUP (ORDER BY ca.Seq) AS GradesInSlotOrder
    FROM dbo.Toxicity2 AS t
    CROSS APPLY (VALUES
    /*---- START REPLACE ------*/
    (1, ANEMIA , 'Anemia' , Causality1, Relatedness1),
    (2, ALOPECIA, 'Alopecia', Causality2, Relatedness2),
    (3, ANOREXIA, 'Anorexia', Causality3, Relatedness3)
    /*---- END REPLACE ------*/
    ) ca (Seq, Grade, Symptom, Causality, Relatedness)
    GROUP BY t.PatientID, t.Cycle;

    Apologies for being redundant, but I wanted to make sure I answered your entire question.

    For the dynamic part define the pattern clearly:

    - Is it always Causality + digits only (Causality1, Causality2, ...)? Or could it be Causality_1, Causality01, etc.? The query should strictly enforce the pattern with a typed check (PATINDEX [0-9][0-9]etc. on the suffix(es)) so it doesn't accidentally pick up columns like CausalityInsurance, CausalityDetail, etc.

    - Is a slot only valid if all 3 columns exist (Grade, CausalityN, RelatednessN)?

    Yes. the Causality## and Relatedness c0lumn names always end with a numeric suffix. They're 1,2,3 not 01, 02, 03.  No need to worry about other variants. The weird thing is that the Grade isn't explicitly in the table. The value for the grade is stored in the column in all caps (the Symptom or Toxicity column).

    There isn't really a case where Causality would exist without Relatedness.

    Thanks for looking at this. Did I answer all your questions?

  • The basic pattern is SymptomName, Causality#, Relatedness#

    It's oncology stuff, so there are no other column names like that. Causality and Relatedness all have numeric suffixes from 1 to however many they need. The stupid thing is that every drug protocol (study database) has only one "Toxicity" table like this, and they have varying numbers of these sets, and the symptom names aren't constant.

    I was thinking of basically denying access to the base table and force the users to suffer through a table valued function to return a normalized view of the data. (either that or append the stuff to a final table, but then you have to go looking for updates which is a hassle).

    The grade part is weird. The value for the grade is stored in the column in all caps (that's the symptom). That's why the weird 'ColumnName' AS Grade is there.

    The way I was thinking of solving it (ha ha ha) was to make the chunk inside the VALUES () section the result of a function or that returned the sets of columns I needed....

    SELECT t.PatientID, t.Cycle,
    STRING_AGG(COALESCE(CONVERT(varchar(10), ca.Grade), ''), ',') WITHIN GROUP (ORDER BY ca.Seq) AS GradesInSlotOrder
    FROM dbo.Toxicity2 AS t
    CROSS APPLY (VALUES
    /*---- START REPLACE ------*/
    (1, ANEMIA , 'Anemia' , Causality1, Relatedness1),
    (2, ALOPECIA, 'Alopecia', Causality2, Relatedness2),
    (3, ANOREXIA, 'Anorexia', Causality3, Relatedness3)
    /*---- END REPLACE ------*/
    ) ca (Seq, Grade, Symptom, Causality, Relatedness)
    GROUP BY t.PatientID, t.Cycle;

    Apologies for being redundant, but I wanted to make sure I answered your entire question.

    For the dynamic part define the pattern clearly:

    - Is it always Causality + digits only (Causality1, Causality2, ...)? Or could it be Causality_1, Causality01, etc.? The query should strictly enforce the pattern with a typed check (PATINDEX [0-9][0-9]etc. on the suffix(es)) so it doesn't accidentally pick up columns like CausalityInsurance, CausalityDetail, etc.

    - Is a slot only valid if all 3 columns exist (Grade, CausalityN, RelatednessN)?

    Yes. the Causality## and Relatedness c0lumn names always end with a numeric suffix. They're 1,2,3 not 01, 02, 03.  No need to worry about other variants. The weird thing is that the Grade isn't explicitly in the table. The value for the grade is stored in the column in all caps (the Symptom or Toxicity column).

    There isn't really a case where Causality would exist without Relatedness.

    Thanks for looking at this. Did I answer all your questions?

  • Yes that answers it.  It seems possible to create an iTVF per table/schema (if it's hard coded explicitly referencing columns and table(s)) that gives users what they're looking for.  Not with the dynamics tho

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • I was thinking of writing an iTVF to basically do the CROSS APPLY stuff for me. I'd do it tonight, but SQL Saturday is tomorrow... so I'll try it this weekend. Seems like I should be able to pass a table and then in the code, find the pattern and build the cross apply sets dynamically. So I build the SQL and then at the end use sp_execute() to execute it, and then return the result, right?

  • (Apparently, I like to double-post... and the checker here doesn't work.)

    • This reply was modified 12 hours, 41 minutes ago by pietlinden.
    • This reply was modified 12 hours, 41 minutes ago by pietlinden.

Viewing 9 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply