r - Split-sample design -

- May 15, 2012

i working on large data set containing 9,000 observations belonging different groups. now, use method called split-sample design analyze data. let me explain in detail do. data has following structure:

groupid  performance   commitment   affect   size 1234     5             4            2        2 1234     6             8            9        2 2235     4             3            2        5 2235     4             3            2        5 2235     2             1            7        5 2235     2             1            7        5 2235     2             6            10       5 3678     3             5            5        4 3678     7             3            5        4 3678     5             2            6        4 3678     1             4            6        4

now, aggregate data in following way: each group, use average performance score of first half of group , average commitment , affect scores of second half of group create 1 new observation (for uneven group sizes drop 1 random observation within group - e.g. last observation in group - create group size). however, in 2 steps. first, data should this:

groupid  performance   commitment   affect   size 1234     5             8            9        2 2235     4             1            7        5 2235     4             1            7        5 3678     3             2            6        4 3678     7             4            6        4

in next step, aggregate data. new data set have 1 observation per group , this:

groupid  performance   commitment   affect   size 1234     5             8            9        2 2235     4             1            7        5 3678     5             3            6        4

again, please note last observation of group 2235 dropped, since group size uneven number.

is there package out there split , aggregate data in way? if not, how go ahead , code this? grateful advice, since have no idea how elegantly approach this, other writing bunch of for loops.

here code above example:

groupid <- c(1234, 1234, 2235, 2235, 2235, 2235, 2235, 3678, 3678, 3678, 3678) performance <- c(5, 6, 4, 4, 2, 2, 2, 3, 7, 5, 1) commitment <- c(4, 8, 3, 3, 1, 1, 6, 5, 3, 2, 4) affect <- c(2, 9, 2, 2, 7, 7, 10, 5, 5, 6, 6) size <- c(2, 2, 5, 5, 5, 5, 5, 4, 4, 4, 4) mydata <- data.frame(groupid, performance, commitment, affect, size)

many thanks!!

here solution:

library(plyr) mydata1<-ddply(mydata,.(groupid),summarize,aveper=mean(head((performance),length(groupid)/2)), avecom=mean(tail((commitment),length(groupid)/2)), aveaff=mean(tail((affect),length(groupid)/2)),avesiz=mean(size))  > mydata1   groupid aveper avecom aveaff avesiz 1    1234      5  8.000      9      2 2    2235      4  2.667      8      5 3    3678      5  3.000      6      4

update:

    mydata2<-ddply(mydata,.(groupid),transform,aveper=mean(head((performance),length(groupid)/2)), avecom=mean(tail((commitment),length(groupid)/2)), aveaff=mean(tail((affect),length(groupid)/2)),avesiz=mean(size),lengr=length(groupid))      > mydata2    groupid performance commitment affect size aveper avecom aveaff avesiz lengr 1     1234           5          4      2    2      5  8.000      9      2     2 2     1234           6          8      9    2      5  8.000      9      2     2 3     2235           4          3      2    5      4  2.667      8      5     5 4     2235           4          3      2    5      4  2.667      8      5     5 5     2235           2          1      7    5      4  2.667      8      5     5 6     2235           2          1      7    5      4  2.667      8      5     5 8     3678           3          5      5    4      5  3.000      6      4     4 9     3678           7          3      5    4      5  3.000      6      4     4 10    3678           5          2      6    4      5  3.000      6      4     4 11    3678           1          4      6    4      5  3.000      6      4     4  mydata2<-mydata2[-7,] # assumes have taken care of uneven groups mydata3<-map(function(x)head(mydata2[mydata2$groupid==x,],head(mydata2$lengr[which(mydata2$groupid==x)],1)/2),unique(mydata2$groupid))  library(plyr) mydata4<-ldply(mydata3)  mydata5<-mydata4[,c(1,6:9)] > mydata5   groupid aveper avecom aveaff avesiz 1    1234      5  8.000      9      2 2    2235      4  2.667      8      5 3    2235      4  2.667      8      5 4    3678      5  3.000      6      4 5    3678      5  3.000      6      4

Search This Blog

Code wiki

r - Split-sample design -

Comments

Post a Comment

Popular posts from this blog

design - Custom Styling Qt Quick Controls -

sql - Is there any inbuilt stored procedure which will return the output of a query as an XML document..? -

authentication - No admin button in Trac -