R: regex from string to two dimensional data frame in one command? -
i have string s containing such key-value pairs, , construct data frame,
s="{'#jj': 121, '#nn': 938, '#dt': 184, '#vb': 338, '#rb': 52}" r1<-sapply(strsplit(s, "[^0-9_]+",as.numeric),as.numeric) r2<-sapply(strsplit(s, "[^a-z]+",as.numeric),as.character) d<-data.frame(id=r2,value=r1)
what gives:
r1 [,1] [1,] na [2,] 121 [3,] 938 [4,] 184 [5,] 338 [6,] 52 r2 [,1] [1,] "" [2,] "jj" [3,] "nn" [4,] "dt" [5,] "vb" [6,] "rb" d id value 1 na 2 jj 121 3 nn 938 4 dt 184 5 vb 338 6 rb 52
first i don't have na , "" after using regular expression. think should {2,} meaning match second occurence, can not in r.
another think be: having data frame column below:
m 1 {'#jj': 121, '#nn': 938, '#dt': 184, '#vb': 338, '#rb': 52} 2 {'#nn': 168, '#dt': 59, '#vb': 71, '#rb': 5, '#jj': 35} 3 {'#jj': 18, '#nn': 100, '#dt': 23, '#vb': 52, '#rb': 11} 4 {'#nn': 156, '#jj': 39, '#dt': 46, '#vb': 67, '#rb': 21} 5 {'#nn': 112, '#dt': 39, '#vb': 57, '#rb': 8, '#jj': 32} 6 {'#dt': 236, '#nn': 897, '#vb': 420, '#rb': 122, '#jj': 240} 7 {'#nn': 316, '#rb': 25, '#dt': 66, '#vb': 112, '#jj': 81} 8 {'#nn': 198, '#dt': 29, '#vb': 85, '#rb': 37, '#jj': 44} 9 {'#rb': 30} 10 {'#nn': 373, '#dt': 48, '#vb': 71, '#rb': 21, '#jj': 36} 11 {'#nn': 49, '#dt': 17, '#vb': 23, '#rb': 11, '#jj': 8} 12 {'#nn': 807, '#jj': 135, '#dt': 177, '#vb': 315, '#rb': 69}
i iterate on each row , split numerical values columns named key.
example of few rows showing, how looks like:
i use parses json, data seems be:
s <- "{'#jj': 121, '#nn': 938, '#dt': 184, '#vb': 338, '#rb': 52}" parse.one <- function(s) { require(rjson) v <- fromjson(gsub("'", '"', s)) data.frame(id = gsub("#", "", names(v)), value = unlist(v, use.names = false)) } parse.one(s) # id value # 1 jj 121 # 2 nn 938 # 3 dt 184 # 4 vb 338 # 5 rb 52
for second part of question, pass modified version of parse.one
function through lapply
, let plyr's rbind.fill
function align pieces while filling missing values na
:
df <- data.frame(m = c( "{'#jj': 121, '#nn': 938, '#dt': 184, '#vb': 338, '#rb': 52}", "{'#nn': 168, '#dt': 59, '#vb': 71, '#rb': 5, '#jj': 35}", "{'#jj': 18, '#nn': 100, '#dt': 23, '#vb': 52, '#rb': 11}", "{'#jj': 12, '#vb': 5}" )) parse.one <- function(s) { require(rjson) y <- fromjson(gsub("'", '"', s)) names(y) <- gsub("#", "", names(y)) as.data.frame(y) } library(plyr) rbind.fill(lapply(df$m, parse.one)) # jj nn dt vb rb # 1 121 938 184 338 52 # 2 35 168 59 71 5 # 3 18 100 23 52 11 # 4 12 na na 5 na
Comments
Post a Comment