SAS - Replicate multiple observations across rows

up vote
0
down vote

favorite

I have a data structure that looks like this:

DATA have ; 

INPUT famid indid implicate imp_inc; 

CARDS ; 

1 1 1 40000

1 1 2 25000

1 1 3 34000

1 1 4 23555

1 1 5 49850

1 2 1 1000

1 2 2 2000

1 2 3 3000

1 2 4 4000

1 2 5 5000

1 3 1 .

1 3 2 .

1 3 3 .

1 3 4 .

1 3 5 .

2 1 1 40000

2 1 2 45000

2 1 3 50000

2 1 4 34000

2 1 5 23500

2 2 1 .

2 2 2 .

2 2 3 .

2 2 4 .

2 2 5 .

2 3 1 41000

2 3 2 39000

2 3 3 24000

2 3 4 32000

2 3 5 53000

RUN ;

So, we have family id, individual id, implicate number and imputed income for each implicate.

What i need is to replicate the results of the first individual in each family (all of the five implicates) for the remaining individuals within each family, replacing whatever values we previously had on those cells, like this:

DATA want ; 

INPUT famid indid implicate imp_inc; 

CARDS ; 

1 1 1 40000

1 1 2 25000

1 1 3 34000

1 1 4 23555

1 1 5 49850

1 2 1 40000

1 2 2 25000

1 2 3 34000

1 2 4 23555

1 2 5 49850

1 3 1 40000

1 3 2 25000

1 3 3 34000

1 3 4 23555

1 3 5 49850

2 1 1 40000

2 1 2 45000

2 1 3 50000

2 1 4 34000

2 1 5 23500

2 2 1 40000

2 2 2 45000

2 2 3 50000

2 2 4 34000

2 2 5 23500

2 3 1 40000

2 3 2 45000

2 3 3 50000

2 3 4 34000

2 3 5 23500

RUN ;

In this example I'm trying to replicate only one variable but in my project I will have to do this for dozens of variables.

So far, I came up with this solution:

%let implist_1=imp_inc;



%macro copyv1(list);

    %let nwords=%sysfunc(countw(&list));

    %do i=1 %to &nwords;

    %let varl=%scan(&list, &i);

        proc means data=have max noprint;

            var &varl;

            by famid implicate;

            where indid=1;

            OUTPUT OUT=copy max=max_&varl;  

        run;

        data want;

            set have;

            drop &varl;

        run;

        data want (drop=_TYPE_ _FREQ_);

            merge want copy;

            by famid implicate;

            rename max_&varl=&varl;

        run;

    %end;

%mend;

%copyv1(&imp_list1);

This works well for one or two variables. However it is tremendously slow once you do it for 400 variables in a data-set with the size of 1.5 GB.

I'm pretty sure there is a faster way to do this with some form of proc sql or first.var etc., but i'm relatively new to SAS and so far I couldn't come up with a better solution.

Thank you very much for your support.

Best regards

edited Nov 7 at 10:35

asked Nov 7 at 10:26

Luis Martins

106

Is the source data already sorted by FAMID so that first record is the one to replicate? What if the value to be replicated is missing on the first record?
– Tom
Nov 7 at 17:15

add a comment |

up vote
0
down vote

favorite

I have a data structure that looks like this:

DATA have ; 

INPUT famid indid implicate imp_inc; 

CARDS ; 

1 1 1 40000

1 1 2 25000

1 1 3 34000

1 1 4 23555

1 1 5 49850

1 2 1 1000

1 2 2 2000

1 2 3 3000

1 2 4 4000

1 2 5 5000

1 3 1 .

1 3 2 .

1 3 3 .

1 3 4 .

1 3 5 .

2 1 1 40000

2 1 2 45000

2 1 3 50000

2 1 4 34000

2 1 5 23500

2 2 1 .

2 2 2 .

2 2 3 .

2 2 4 .

2 2 5 .

2 3 1 41000

2 3 2 39000

2 3 3 24000

2 3 4 32000

2 3 5 53000

RUN ;

So, we have family id, individual id, implicate number and imputed income for each implicate.

DATA want ; 

INPUT famid indid implicate imp_inc; 

CARDS ; 

1 1 1 40000

1 1 2 25000

1 1 3 34000

1 1 4 23555

1 1 5 49850

1 2 1 40000

1 2 2 25000

1 2 3 34000

1 2 4 23555

1 2 5 49850

1 3 1 40000

1 3 2 25000

1 3 3 34000

1 3 4 23555

1 3 5 49850

2 1 1 40000

2 1 2 45000

2 1 3 50000

2 1 4 34000

2 1 5 23500

2 2 1 40000

2 2 2 45000

2 2 3 50000

2 2 4 34000

2 2 5 23500

2 3 1 40000

2 3 2 45000

2 3 3 50000

2 3 4 34000

2 3 5 23500

RUN ;

In this example I'm trying to replicate only one variable but in my project I will have to do this for dozens of variables.

So far, I came up with this solution:

%let implist_1=imp_inc;



%macro copyv1(list);

    %let nwords=%sysfunc(countw(&list));

    %do i=1 %to &nwords;

    %let varl=%scan(&list, &i);

        proc means data=have max noprint;

            var &varl;

            by famid implicate;

            where indid=1;

            OUTPUT OUT=copy max=max_&varl;  

        run;

        data want;

            set have;

            drop &varl;

        run;

        data want (drop=_TYPE_ _FREQ_);

            merge want copy;

            by famid implicate;

            rename max_&varl=&varl;

        run;

    %end;

%mend;

%copyv1(&imp_list1);

This works well for one or two variables. However it is tremendously slow once you do it for 400 variables in a data-set with the size of 1.5 GB.

I'm pretty sure there is a faster way to do this with some form of proc sql or first.var etc., but i'm relatively new to SAS and so far I couldn't come up with a better solution.

Thank you very much for your support.

Best regards

edited Nov 7 at 10:35

asked Nov 7 at 10:26

Luis Martins

106

Is the source data already sorted by FAMID so that first record is the one to replicate? What if the value to be replicated is missing on the first record?
– Tom
Nov 7 at 17:15

add a comment |

up vote
0
down vote

favorite

I have a data structure that looks like this:

DATA have ; 

INPUT famid indid implicate imp_inc; 

CARDS ; 

1 1 1 40000

1 1 2 25000

1 1 3 34000

1 1 4 23555

1 1 5 49850

1 2 1 1000

1 2 2 2000

1 2 3 3000

1 2 4 4000

1 2 5 5000

1 3 1 .

1 3 2 .

1 3 3 .

1 3 4 .

1 3 5 .

2 1 1 40000

2 1 2 45000

2 1 3 50000

2 1 4 34000

2 1 5 23500

2 2 1 .

2 2 2 .

2 2 3 .

2 2 4 .

2 2 5 .

2 3 1 41000

2 3 2 39000

2 3 3 24000

2 3 4 32000

2 3 5 53000

RUN ;

So, we have family id, individual id, implicate number and imputed income for each implicate.

DATA want ; 

INPUT famid indid implicate imp_inc; 

CARDS ; 

1 1 1 40000

1 1 2 25000

1 1 3 34000

1 1 4 23555

1 1 5 49850

1 2 1 40000

1 2 2 25000

1 2 3 34000

1 2 4 23555

1 2 5 49850

1 3 1 40000

1 3 2 25000

1 3 3 34000

1 3 4 23555

1 3 5 49850

2 1 1 40000

2 1 2 45000

2 1 3 50000

2 1 4 34000

2 1 5 23500

2 2 1 40000

2 2 2 45000

2 2 3 50000

2 2 4 34000

2 2 5 23500

2 3 1 40000

2 3 2 45000

2 3 3 50000

2 3 4 34000

2 3 5 23500

RUN ;

In this example I'm trying to replicate only one variable but in my project I will have to do this for dozens of variables.

So far, I came up with this solution:

%let implist_1=imp_inc;



%macro copyv1(list);

    %let nwords=%sysfunc(countw(&list));

    %do i=1 %to &nwords;

    %let varl=%scan(&list, &i);

        proc means data=have max noprint;

            var &varl;

            by famid implicate;

            where indid=1;

            OUTPUT OUT=copy max=max_&varl;  

        run;

        data want;

            set have;

            drop &varl;

        run;

        data want (drop=_TYPE_ _FREQ_);

            merge want copy;

            by famid implicate;

            rename max_&varl=&varl;

        run;

    %end;

%mend;

%copyv1(&imp_list1);

This works well for one or two variables. However it is tremendously slow once you do it for 400 variables in a data-set with the size of 1.5 GB.

I'm pretty sure there is a faster way to do this with some form of proc sql or first.var etc., but i'm relatively new to SAS and so far I couldn't come up with a better solution.

Thank you very much for your support.

Best regards

edited Nov 7 at 10:35

asked Nov 7 at 10:26

Luis Martins

106

I have a data structure that looks like this:

DATA have ; 

INPUT famid indid implicate imp_inc; 

CARDS ; 

1 1 1 40000

1 1 2 25000

1 1 3 34000

1 1 4 23555

1 1 5 49850

1 2 1 1000

1 2 2 2000

1 2 3 3000

1 2 4 4000

1 2 5 5000

1 3 1 .

1 3 2 .

1 3 3 .

1 3 4 .

1 3 5 .

2 1 1 40000

2 1 2 45000

2 1 3 50000

2 1 4 34000

2 1 5 23500

2 2 1 .

2 2 2 .

2 2 3 .

2 2 4 .

2 2 5 .

2 3 1 41000

2 3 2 39000

2 3 3 24000

2 3 4 32000

2 3 5 53000

RUN ;

So, we have family id, individual id, implicate number and imputed income for each implicate.

DATA want ; 

INPUT famid indid implicate imp_inc; 

CARDS ; 

1 1 1 40000

1 1 2 25000

1 1 3 34000

1 1 4 23555

1 1 5 49850

1 2 1 40000

1 2 2 25000

1 2 3 34000

1 2 4 23555

1 2 5 49850

1 3 1 40000

1 3 2 25000

1 3 3 34000

1 3 4 23555

1 3 5 49850

2 1 1 40000

2 1 2 45000

2 1 3 50000

2 1 4 34000

2 1 5 23500

2 2 1 40000

2 2 2 45000

2 2 3 50000

2 2 4 34000

2 2 5 23500

2 3 1 40000

2 3 2 45000

2 3 3 50000

2 3 4 34000

2 3 5 23500

RUN ;

In this example I'm trying to replicate only one variable but in my project I will have to do this for dozens of variables.

So far, I came up with this solution:

%let implist_1=imp_inc;



%macro copyv1(list);

    %let nwords=%sysfunc(countw(&list));

    %do i=1 %to &nwords;

    %let varl=%scan(&list, &i);

        proc means data=have max noprint;

            var &varl;

            by famid implicate;

            where indid=1;

            OUTPUT OUT=copy max=max_&varl;  

        run;

        data want;

            set have;

            drop &varl;

        run;

        data want (drop=_TYPE_ _FREQ_);

            merge want copy;

            by famid implicate;

            rename max_&varl=&varl;

        run;

    %end;

%mend;

%copyv1(&imp_list1);

This works well for one or two variables. However it is tremendously slow once you do it for 400 variables in a data-set with the size of 1.5 GB.

I'm pretty sure there is a faster way to do this with some form of proc sql or first.var etc., but i'm relatively new to SAS and so far I couldn't come up with a better solution.

Thank you very much for your support.

Best regards

sas rows replicate

edited Nov 7 at 10:35

asked Nov 7 at 10:26

Luis Martins

106

edited Nov 7 at 10:35

asked Nov 7 at 10:26

Luis Martins

106

edited Nov 7 at 10:35

asked Nov 7 at 10:26

Luis Martins

106

asked Nov 7 at 10:26

Luis Martins

106

asked Nov 7 at 10:26

Luis Martins

106

Is the source data already sorted by FAMID so that first record is the one to replicate? What if the value to be replicated is missing on the first record?
– Tom
Nov 7 at 17:15

add a comment |

Is the source data already sorted by FAMID so that first record is the one to replicate? What if the value to be replicated is missing on the first record?
– Tom
Nov 7 at 17:15

Is the source data already sorted by FAMID so that first record is the one to replicate? What if the value to be replicated is missing on the first record?
– Tom
Nov 7 at 17:15

add a comment |

2 Answers
2

active

oldest

votes

up vote
0
down vote

accepted

Yes, this can be done in DATA step using a first. reference made available via the by statement.

data want;

  set have (keep=famid indid implicate imp_inc /* other vars */);



  by famid indid implicate; /* by implicate is so step logs an error (at run-time) if data not sorted */



  if first.famid then if indid ne 1 then abort;



  array across imp_inc           /* other vars */;

  array hold [1,5] _temporary_;  /* or [<n>,5] where <n> means the number of variables in the across array */



  if indid = 1 then do;          /* hold data for 1st individuals implicate across data */

    do _n_ = 1 to dim(across);

      hold[_n_,implicate] = across[_n_];  /* store info of each implicate of first individual */

    end;

  end;

  else do;

    do _n_ = 1 to dim(across);

      across[_n_] = hold[_n_,implicate];  /* apply 1st persons info to subsequent persons */

    end;

  end;

run;

The DATA step could be significantly faster due to single pass through data, however there is an internal processing cost associated with calculating all those pesky array addresses at run; time, and that cost could become impactful at some <n>

SQL is simpler syntax, clearer understanding and works if have data set is unsorted or has some peculiar sequencing in the by group.

answered Nov 7 at 13:22

Richard

7,38321125

With the SQL procedure my entire program takes 11 minutes per iteration to run; with this solution it takes 9 minutes. Thanks a lot!
– Luis Martins
Nov 8 at 10:59

Great! Welcome to Stack overflow, learn more about the question and answer process
– Richard
Nov 8 at 13:35

add a comment |

up vote
0
down vote

This is fairly straightforward with a bit of SQL:

proc sql;

create table want as 

  select a.famid, a.indid, a.implicate, b.* from 

  have a 

  left join (

    select * from have 

    group by famid 

    having indid = min(indid)

  ) b 

  on

        a.famid = b.famid 

    and a.implicate = b.implicate

  order by a.famid, a.indid, a.implicate

  ;

quit;

The idea is to join the table to a subset of itself containing only the rows corresponding to the first individual within each family.

It is set up to pick the lowest numbered individual within each family, so it will work even if there is no row with indid = 1. If you are sure that there will always be such a row, you can use a slightly simpler query:

proc sql;

create table want as 

  select a.famid, a.indid, a.implicate, b.* from 

  have(sortedby = famid) a 

  left join have(where = (indid = 1)) b 

  on

        a.famid = b.famid 

    and a.implicate = b.implicate

  order by a.famid, a.indid, a.implicate

  ;

quit;

Specifying sortedby = famid provides a hint to the query optimiser that it can skip one of the initial sorts required for the join, which may improve performance a bit.

edited Nov 7 at 11:11

answered Nov 7 at 10:48

user667489

7,69721226

Thank you for your reply. I don't know much about sql, but this seems to replicate all variables right? If I want to select a subset I would need to put select imp_inc instead of select *, right?
– Luis Martins
Nov 7 at 11:01

You can also specify a comma-separated list of variables, or use data set options, e.g. select * from have(keep = key1-key3 firstvar--lastvar)
– user667489
Nov 7 at 11:09

This works really nice thank you very much!
– Luis Martins
Nov 7 at 15:15

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53187588%2fsas-replicate-multiple-observations-across-rows%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
0
down vote

accepted

Yes, this can be done in DATA step using a first. reference made available via the by statement.

data want;

  set have (keep=famid indid implicate imp_inc /* other vars */);



  by famid indid implicate; /* by implicate is so step logs an error (at run-time) if data not sorted */



  if first.famid then if indid ne 1 then abort;



  array across imp_inc           /* other vars */;

  array hold [1,5] _temporary_;  /* or [<n>,5] where <n> means the number of variables in the across array */



  if indid = 1 then do;          /* hold data for 1st individuals implicate across data */

    do _n_ = 1 to dim(across);

      hold[_n_,implicate] = across[_n_];  /* store info of each implicate of first individual */

    end;

  end;

  else do;

    do _n_ = 1 to dim(across);

      across[_n_] = hold[_n_,implicate];  /* apply 1st persons info to subsequent persons */

    end;

  end;

run;

SQL is simpler syntax, clearer understanding and works if have data set is unsorted or has some peculiar sequencing in the by group.

answered Nov 7 at 13:22

Richard

7,38321125

With the SQL procedure my entire program takes 11 minutes per iteration to run; with this solution it takes 9 minutes. Thanks a lot!
– Luis Martins
Nov 8 at 10:59

Great! Welcome to Stack overflow, learn more about the question and answer process
– Richard
Nov 8 at 13:35

add a comment |

up vote
0
down vote

accepted

Yes, this can be done in DATA step using a first. reference made available via the by statement.

data want;

  set have (keep=famid indid implicate imp_inc /* other vars */);



  by famid indid implicate; /* by implicate is so step logs an error (at run-time) if data not sorted */



  if first.famid then if indid ne 1 then abort;



  array across imp_inc           /* other vars */;

  array hold [1,5] _temporary_;  /* or [<n>,5] where <n> means the number of variables in the across array */



  if indid = 1 then do;          /* hold data for 1st individuals implicate across data */

    do _n_ = 1 to dim(across);

      hold[_n_,implicate] = across[_n_];  /* store info of each implicate of first individual */

    end;

  end;

  else do;

    do _n_ = 1 to dim(across);

      across[_n_] = hold[_n_,implicate];  /* apply 1st persons info to subsequent persons */

    end;

  end;

run;

SQL is simpler syntax, clearer understanding and works if have data set is unsorted or has some peculiar sequencing in the by group.

answered Nov 7 at 13:22

Richard

7,38321125

With the SQL procedure my entire program takes 11 minutes per iteration to run; with this solution it takes 9 minutes. Thanks a lot!
– Luis Martins
Nov 8 at 10:59

Great! Welcome to Stack overflow, learn more about the question and answer process
– Richard
Nov 8 at 13:35

add a comment |

up vote
0
down vote

accepted

Yes, this can be done in DATA step using a first. reference made available via the by statement.

data want;

  set have (keep=famid indid implicate imp_inc /* other vars */);



  by famid indid implicate; /* by implicate is so step logs an error (at run-time) if data not sorted */



  if first.famid then if indid ne 1 then abort;



  array across imp_inc           /* other vars */;

  array hold [1,5] _temporary_;  /* or [<n>,5] where <n> means the number of variables in the across array */



  if indid = 1 then do;          /* hold data for 1st individuals implicate across data */

    do _n_ = 1 to dim(across);

      hold[_n_,implicate] = across[_n_];  /* store info of each implicate of first individual */

    end;

  end;

  else do;

    do _n_ = 1 to dim(across);

      across[_n_] = hold[_n_,implicate];  /* apply 1st persons info to subsequent persons */

    end;

  end;

run;

SQL is simpler syntax, clearer understanding and works if have data set is unsorted or has some peculiar sequencing in the by group.

answered Nov 7 at 13:22

Richard

7,38321125

Yes, this can be done in DATA step using a first. reference made available via the by statement.

data want;

  set have (keep=famid indid implicate imp_inc /* other vars */);



  by famid indid implicate; /* by implicate is so step logs an error (at run-time) if data not sorted */



  if first.famid then if indid ne 1 then abort;



  array across imp_inc           /* other vars */;

  array hold [1,5] _temporary_;  /* or [<n>,5] where <n> means the number of variables in the across array */



  if indid = 1 then do;          /* hold data for 1st individuals implicate across data */

    do _n_ = 1 to dim(across);

      hold[_n_,implicate] = across[_n_];  /* store info of each implicate of first individual */

    end;

  end;

  else do;

    do _n_ = 1 to dim(across);

      across[_n_] = hold[_n_,implicate];  /* apply 1st persons info to subsequent persons */

    end;

  end;

run;

SQL is simpler syntax, clearer understanding and works if have data set is unsorted or has some peculiar sequencing in the by group.

answered Nov 7 at 13:22

Richard

7,38321125

answered Nov 7 at 13:22

Richard

7,38321125

answered Nov 7 at 13:22

Richard

7,38321125

answered Nov 7 at 13:22

Richard

7,38321125

With the SQL procedure my entire program takes 11 minutes per iteration to run; with this solution it takes 9 minutes. Thanks a lot!
– Luis Martins
Nov 8 at 10:59

Great! Welcome to Stack overflow, learn more about the question and answer process
– Richard
Nov 8 at 13:35

add a comment |

With the SQL procedure my entire program takes 11 minutes per iteration to run; with this solution it takes 9 minutes. Thanks a lot!
– Luis Martins
Nov 8 at 10:59

Great! Welcome to Stack overflow, learn more about the question and answer process
– Richard
Nov 8 at 13:35

With the SQL procedure my entire program takes 11 minutes per iteration to run; with this solution it takes 9 minutes. Thanks a lot!
– Luis Martins
Nov 8 at 10:59

Great! Welcome to Stack overflow, learn more about the question and answer process
– Richard
Nov 8 at 13:35

add a comment |

up vote
0
down vote

This is fairly straightforward with a bit of SQL:

proc sql;

create table want as 

  select a.famid, a.indid, a.implicate, b.* from 

  have a 

  left join (

    select * from have 

    group by famid 

    having indid = min(indid)

  ) b 

  on

        a.famid = b.famid 

    and a.implicate = b.implicate

  order by a.famid, a.indid, a.implicate

  ;

quit;

The idea is to join the table to a subset of itself containing only the rows corresponding to the first individual within each family.

proc sql;

create table want as 

  select a.famid, a.indid, a.implicate, b.* from 

  have(sortedby = famid) a 

  left join have(where = (indid = 1)) b 

  on

        a.famid = b.famid 

    and a.implicate = b.implicate

  order by a.famid, a.indid, a.implicate

  ;

quit;

Specifying sortedby = famid provides a hint to the query optimiser that it can skip one of the initial sorts required for the join, which may improve performance a bit.

edited Nov 7 at 11:11

answered Nov 7 at 10:48

user667489

7,69721226

Thank you for your reply. I don't know much about sql, but this seems to replicate all variables right? If I want to select a subset I would need to put select imp_inc instead of select *, right?
– Luis Martins
Nov 7 at 11:01

You can also specify a comma-separated list of variables, or use data set options, e.g. select * from have(keep = key1-key3 firstvar--lastvar)
– user667489
Nov 7 at 11:09

This works really nice thank you very much!
– Luis Martins
Nov 7 at 15:15

add a comment |

up vote
0
down vote

This is fairly straightforward with a bit of SQL:

proc sql;

create table want as 

  select a.famid, a.indid, a.implicate, b.* from 

  have a 

  left join (

    select * from have 

    group by famid 

    having indid = min(indid)

  ) b 

  on

        a.famid = b.famid 

    and a.implicate = b.implicate

  order by a.famid, a.indid, a.implicate

  ;

quit;

The idea is to join the table to a subset of itself containing only the rows corresponding to the first individual within each family.

proc sql;

create table want as 

  select a.famid, a.indid, a.implicate, b.* from 

  have(sortedby = famid) a 

  left join have(where = (indid = 1)) b 

  on

        a.famid = b.famid 

    and a.implicate = b.implicate

  order by a.famid, a.indid, a.implicate

  ;

quit;

Specifying sortedby = famid provides a hint to the query optimiser that it can skip one of the initial sorts required for the join, which may improve performance a bit.

edited Nov 7 at 11:11

answered Nov 7 at 10:48

user667489

7,69721226

Thank you for your reply. I don't know much about sql, but this seems to replicate all variables right? If I want to select a subset I would need to put select imp_inc instead of select *, right?
– Luis Martins
Nov 7 at 11:01

You can also specify a comma-separated list of variables, or use data set options, e.g. select * from have(keep = key1-key3 firstvar--lastvar)
– user667489
Nov 7 at 11:09

This works really nice thank you very much!
– Luis Martins
Nov 7 at 15:15

add a comment |

up vote
0
down vote

This is fairly straightforward with a bit of SQL:

proc sql;

create table want as 

  select a.famid, a.indid, a.implicate, b.* from 

  have a 

  left join (

    select * from have 

    group by famid 

    having indid = min(indid)

  ) b 

  on

        a.famid = b.famid 

    and a.implicate = b.implicate

  order by a.famid, a.indid, a.implicate

  ;

quit;

The idea is to join the table to a subset of itself containing only the rows corresponding to the first individual within each family.

proc sql;

create table want as 

  select a.famid, a.indid, a.implicate, b.* from 

  have(sortedby = famid) a 

  left join have(where = (indid = 1)) b 

  on

        a.famid = b.famid 

    and a.implicate = b.implicate

  order by a.famid, a.indid, a.implicate

  ;

quit;

Specifying sortedby = famid provides a hint to the query optimiser that it can skip one of the initial sorts required for the join, which may improve performance a bit.

edited Nov 7 at 11:11

answered Nov 7 at 10:48

user667489

7,69721226

This is fairly straightforward with a bit of SQL:

proc sql;

create table want as 

  select a.famid, a.indid, a.implicate, b.* from 

  have a 

  left join (

    select * from have 

    group by famid 

    having indid = min(indid)

  ) b 

  on

        a.famid = b.famid 

    and a.implicate = b.implicate

  order by a.famid, a.indid, a.implicate

  ;

quit;

The idea is to join the table to a subset of itself containing only the rows corresponding to the first individual within each family.

proc sql;

create table want as 

  select a.famid, a.indid, a.implicate, b.* from 

  have(sortedby = famid) a 

  left join have(where = (indid = 1)) b 

  on

        a.famid = b.famid 

    and a.implicate = b.implicate

  order by a.famid, a.indid, a.implicate

  ;

quit;

Specifying sortedby = famid provides a hint to the query optimiser that it can skip one of the initial sorts required for the join, which may improve performance a bit.

edited Nov 7 at 11:11

answered Nov 7 at 10:48

user667489

7,69721226

edited Nov 7 at 11:11

answered Nov 7 at 10:48

user667489

7,69721226

answered Nov 7 at 10:48

user667489

7,69721226

answered Nov 7 at 10:48

user667489

7,69721226

Thank you for your reply. I don't know much about sql, but this seems to replicate all variables right? If I want to select a subset I would need to put select imp_inc instead of select *, right?
– Luis Martins
Nov 7 at 11:01

You can also specify a comma-separated list of variables, or use data set options, e.g. select * from have(keep = key1-key3 firstvar--lastvar)
– user667489
Nov 7 at 11:09

This works really nice thank you very much!
– Luis Martins
Nov 7 at 15:15

add a comment |

Thank you for your reply. I don't know much about sql, but this seems to replicate all variables right? If I want to select a subset I would need to put select imp_inc instead of select *, right?
– Luis Martins
Nov 7 at 11:01

You can also specify a comma-separated list of variables, or use data set options, e.g. select * from have(keep = key1-key3 firstvar--lastvar)
– user667489
Nov 7 at 11:09

This works really nice thank you very much!
– Luis Martins
Nov 7 at 15:15

Thank you for your reply. I don't know much about sql, but this seems to replicate all variables right? If I want to select a subset I would need to put select imp_inc instead of select *, right?
– Luis Martins
Nov 7 at 11:01

You can also specify a comma-separated list of variables, or use data set options, e.g. select * from have(keep = key1-key3 firstvar--lastvar)
– user667489
Nov 7 at 11:09

This works really nice thank you very much!
– Luis Martins
Nov 7 at 15:15

add a comment |

draft saved

draft discarded

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

NZP 5YoD Svkt5 oMFuznurVUrZ89tnDLlzkTVo2gS eKommbexZz

搜尋此網誌

Wsrtjtyk