R: regex from string to two dimensional data frame in one command? -

- February 15, 2013

i have string s containing such key-value pairs, , construct data frame,

s="{'#jj': 121, '#nn': 938, '#dt': 184, '#vb': 338, '#rb': 52}" r1<-sapply(strsplit(s, "[^0-9_]+",as.numeric),as.numeric) r2<-sapply(strsplit(s, "[^a-z]+",as.numeric),as.character) d<-data.frame(id=r2,value=r1)

what gives:

r1      [,1] [1,]   na [2,]  121 [3,]  938 [4,]  184 [5,]  338 [6,]   52  r2      [,1] [1,] ""   [2,] "jj" [3,] "nn" [4,] "dt" [5,] "vb" [6,] "rb"   d   id value 1       na 2 jj   121 3 nn   938 4 dt   184 5 vb   338 6 rb    52

first i don't have na , "" after using regular expression. think should {2,} meaning match second occurence, can not in r.

another think be: having data frame column below:

                                                              m 1   {'#jj': 121, '#nn': 938, '#dt': 184, '#vb': 338, '#rb': 52} 2       {'#nn': 168, '#dt': 59, '#vb': 71, '#rb': 5, '#jj': 35} 3      {'#jj': 18, '#nn': 100, '#dt': 23, '#vb': 52, '#rb': 11} 4      {'#nn': 156, '#jj': 39, '#dt': 46, '#vb': 67, '#rb': 21} 5       {'#nn': 112, '#dt': 39, '#vb': 57, '#rb': 8, '#jj': 32} 6  {'#dt': 236, '#nn': 897, '#vb': 420, '#rb': 122, '#jj': 240} 7     {'#nn': 316, '#rb': 25, '#dt': 66, '#vb': 112, '#jj': 81} 8      {'#nn': 198, '#dt': 29, '#vb': 85, '#rb': 37, '#jj': 44} 9                                                   {'#rb': 30} 10     {'#nn': 373, '#dt': 48, '#vb': 71, '#rb': 21, '#jj': 36} 11       {'#nn': 49, '#dt': 17, '#vb': 23, '#rb': 11, '#jj': 8} 12  {'#nn': 807, '#jj': 135, '#dt': 177, '#vb': 315, '#rb': 69}

i iterate on each row , split numerical values columns named key.

example of few rows showing, how looks like:

enter image description here

i use parses json, data seems be:

s <- "{'#jj': 121, '#nn': 938, '#dt': 184, '#vb': 338, '#rb': 52}"  parse.one <- function(s) {   require(rjson)   v <- fromjson(gsub("'", '"', s))   data.frame(id = gsub("#", "", names(v)),              value = unlist(v, use.names = false))   }  parse.one(s) #   id value # 1 jj   121 # 2 nn   938 # 3 dt   184 # 4 vb   338 # 5 rb    52

for second part of question, pass modified version of parse.one function through lapply, let plyr's rbind.fill function align pieces while filling missing values na:

df <- data.frame(m = c(   "{'#jj': 121, '#nn': 938, '#dt': 184, '#vb': 338, '#rb': 52}",   "{'#nn': 168, '#dt': 59, '#vb': 71, '#rb': 5, '#jj': 35}",   "{'#jj': 18, '#nn': 100, '#dt': 23, '#vb': 52, '#rb': 11}",   "{'#jj': 12, '#vb': 5}" ))  parse.one <- function(s) {   require(rjson)   y <- fromjson(gsub("'", '"', s))   names(y) <- gsub("#", "", names(y))   as.data.frame(y) }  library(plyr) rbind.fill(lapply(df$m, parse.one)) #    jj  nn  dt  vb rb # 1 121 938 184 338 52 # 2  35 168  59  71  5 # 3  18 100  23  52 11 # 4  12  na  na   5 na

Search This Blog

Code wiki

R: regex from string to two dimensional data frame in one command? -

Comments

Post a Comment

Popular posts from this blog

design - Custom Styling Qt Quick Controls -

sql - Is there any inbuilt stored procedure which will return the output of a query as an XML document..? -

Unable to remove the www from url on https using .htaccess -