R:从 JSON/XML (clinicaltrials.gov) 到 data.frame (tidy) 的嵌套列表

R: nested list from JSON/XML (clinicaltrials.gov) to data.frame (tidy)

提问人:ava 提问时间:2/28/2021 最后编辑:ava 更新时间:11/5/2023 访问量:406

问:

目的

对于大学研究,我尝试处理此处公开的临床研究数据。

为了可重复性,我想直接使用下载的JSON或XML文件(而不是通过Web API检索数据,这已经描述过:how-to-get-data-out-of-nested-xml-structure)。

更新1:JSON 文件的结构发布在此处

更新2:XML 文件的结构发布在此处

更新3:

我认为做这个把戏!请参阅答案部分。tidyjson::read_json andtidyjson::spread_all

我需要什么

对于我的工作流程,我需要将数据转换为 data.frames(整洁的 data.frames 会更好)。我更喜欢JSON,如果有XML格式的解决方案,我会很高兴。

测试数据

我为下载的 JSON 文件之一生成的嵌套列表jsonlite::fromJSON("NCT0455805.json")

test <- list(FullStudy = list(Rank = 254369L, Study = list(ProtocolSection = list(
    IdentificationModule = list(NCTId = "NCT01455805", OrgStudyIdInfo = list(
        OrgStudyId = "SS2011UK"), Organization = list(OrgFullName = "Spinal Simplicity LLC", 
        OrgClass = "INDUSTRY"), BriefTitle = "Minuteman Spinal Fusion Implant Versus Surgical Decompression for Lumbar Spinal Stenosis", 
        OfficialTitle = "Efficacy and Quality of Life Following Treatment of Lumbar Spinal Stenosis, Spondylolisthesis or Degenerative Disc Disease With the Minuteman Interspinous Interlaminar Fusion Implant Versus Surgical Decompression"), 
    StatusModule = list(StatusVerifiedDate = "October 2020", 
        OverallStatus = "Active, not recruiting", ExpandedAccessInfo = list(
            HasExpandedAccess = "No"), StartDateStruct = list(
            StartDate = "June 2012"), PrimaryCompletionDateStruct = list(
            PrimaryCompletionDate = "March 2024", PrimaryCompletionDateType = "Anticipated"), 
        CompletionDateStruct = list(CompletionDate = "March 2024", 
            CompletionDateType = "Anticipated"), StudyFirstSubmitDate = "October 13, 2011", 
        StudyFirstSubmitQCDate = "October 18, 2011", StudyFirstPostDateStruct = list(
            StudyFirstPostDate = "October 20, 2011", StudyFirstPostDateType = "Estimate"), 
        LastUpdateSubmitDate = "October 22, 2020", LastUpdatePostDateStruct = list(
            LastUpdatePostDate = "October 26, 2020", LastUpdatePostDateType = "Actual")), 
    SponsorCollaboratorsModule = list(ResponsibleParty = list(
        ResponsiblePartyType = "Sponsor"), LeadSponsor = list(
        LeadSponsorName = "Spinal Simplicity LLC", LeadSponsorClass = "INDUSTRY"), 
        CollaboratorList = list(Collaborator = list(list(CollaboratorName = "The Leeds Teaching Hospitals NHS Trust", 
            CollaboratorClass = "OTHER")))), OversightModule = list(
        OversightHasDMC = "Yes"), DescriptionModule = list(BriefSummary = "Lumbar spinal stenosis (LSS), is a common disorder of narrowing of the spinal canal in the lower part of the back. This causes discomfort in the legs when standing or walking because of pressure on the spinal nerves.There are several treatment options for LSS including physiotherapy, lumbar surgical decompression procedures such as laminectomy, Foraminotomy, Discectomy and more recently devices for interspinous distraction such as the XSTOP® and from May 2011 Minuteman\231.\n\nSurgical decompression for LSS involves the removal of excess bone, ligament, and soft-tissue allowing more room for the nerves. The operation is usually preformed under general anaesthetic and with an average stay in hospital for 2-3 nights. Whereas the Minuteman\231 implant is preformed as a day case under local or general anaesthetic and involves implanting the device into the space between two back bones to relieve pressure on the nerves and, therefore, pain in the legs.\n\nThis is a multi centred (four sites) randomised controlled trial with a total sample of 50 participants after obtaining their informed consent. Participants will attend the pain clinic at the Hospitals for a baseline visit where they will be randomised with a ratio of 1:1 to receive either the Minuteman\231 Interspinous interlaminar fusion Implant or standard surgical decompression for the treatment of lumbar spinal stenosis (LSS). Following randomisation arrangements will be made for the participant to receive the randomised treatment. If allocated to Minuteman\231 Implant, the treatment will be conducted by the Pain Specialist identified at the site. If allocated to surgical decompression, the treatment will be conducted by the neuro/spinal-surgeon identified at the site. Participates will be followed up regularly for 60 months post implant to assess clinical efficacy, safety, participants function and quality of life of each treatment.", 
        DetailedDescription = "This is a prospective randomised study monitoring patients for up to 5 years post treatment. Only patients who have an appropriately diagnosed Lumbar Spinal Stenosis with intermittent claudication with/without low back pain, with no adequate symptomatic relief after at least 6 months of conservative treatment will be asked to give consent to be involved. Potential participants will be approached for enrollment 17days before the planned baseline visit. Patients will be given oral and written information about the trial as well as the patient information leaflet for the study. If informed consent is given their participation in this study will be for a maximum of 5 years."), 
    ConditionsModule = list(ConditionList = list(Condition = c("Lumbar Spinal Stenosis", 
    "Spondylolisthesis", "Degenerative Disc Disease"))), DesignModule = list(
        StudyType = "Interventional", PhaseList = list(Phase = "Not Applicable"), 
        DesignInfo = list(DesignAllocation = "Randomized", DesignInterventionModel = "Parallel Assignment", 
            DesignPrimaryPurpose = "Treatment", DesignMaskingInfo = list(
                DesignMasking = "None (Open Label)")), EnrollmentInfo = list(
            EnrollmentCount = "50", EnrollmentType = "Anticipated")), 
    ArmsInterventionsModule = list(ArmGroupList = list(ArmGroup = list(
        list(ArmGroupLabel = "Minuteman Fusion Implant", ArmGroupType = "Active Comparator", 
            ArmGroupDescription = "Minuteman\231 interspinous interlaminar fusion Implant (interspinous interlaminar fusion device) which gained CE Mark approval in May 2011", 
            ArmGroupInterventionList = list(ArmGroupInterventionName = "Device: Minuteman Fusion Implant")), 
        list(ArmGroupLabel = "Surgical decompression", ArmGroupType = "Other", 
            ArmGroupDescription = "Surgical decompression refers to the following operations Laminectomy, Foraminotomy, Discectomy or any other surgical procedure that the clinician feels is relevant for the decompression of lumbar spinal stenosis.", 
            ArmGroupInterventionList = list(ArmGroupInterventionName = "Procedure: surgical decompression")))), 
        InterventionList = list(Intervention = list(list(InterventionType = "Device", 
            InterventionName = "Minuteman Fusion Implant", InterventionDescription = "The Minuteman\231 interspinous interlaminar fusion device consists of a central threaded portion that has a two-part wing plate hinged near its proximal end, with spikes on the extended distal end of the wing plate, and a multi-spiked end cap plate that is located at the distal end of the device and is retained and tightened in place with a locking hex nut. Compression between the spiked wing plate and the spiked end cap plate serves to fix the spinous processes in place and to facilitate fusion, together with bone graft fusion material placed within the device. The threaded external body has been designed to provide ease of distraction and insertion via a minimally invasive surgical procedure.", 
            InterventionArmGroupLabelList = list(InterventionArmGroupLabel = "Minuteman Fusion Implant"), 
            InterventionOtherNameList = list(InterventionOtherName = "The Minuteman\231 interspinous interlaminar fusion device")), 
            list(InterventionType = "Procedure", InterventionName = "surgical decompression", 
                InterventionDescription = "Surgical decompression refers to the following operations Laminectomy, Foraminotomy, Discectomy or any other surgical procedure that the clinician feels is relevant for the decompression of lumbar spinal stenosis", 
                InterventionArmGroupLabelList = list(InterventionArmGroupLabel = "Surgical decompression"))))), 
    OutcomesModule = list(PrimaryOutcomeList = list(PrimaryOutcome = list(
        list(PrimaryOutcomeMeasure = "Change from baseline of clinical efficacy up to 60 months post procedure", 
            PrimaryOutcomeDescription = "These include:\n\nVisual Analogue Scale (VAS) pain scores Leg Pain\nVisual Analogue Scale (VAS) pain scores Back Pain\nOswestry Disability Index (ODI)\nZurich Claudication Questionnaire (ZCQ)\nAssessment of Physical Function via distance walked in 5 minutes and number of repetitions of sitting to standing in 1 minute.\n\nThe main outcome will be a comparison between treatment groups based on the change from baseline at each follow-up visit for each of the measures listed above.", 
            PrimaryOutcomeTimeFrame = "8 weeks and up to 60 months post procedure."))), 
        SecondaryOutcomeList = list(SecondaryOutcome = list(list(
            SecondaryOutcomeMeasure = "measures of quality of life", 
            SecondaryOutcomeDescription = "These include:\n\nChange in functional status questionnaire from baseline\nParticipants global impression of change from baseline (PGIC)\nClinician's global Impression of change from baseline (CGIC)\nEmployment status", 
            SecondaryOutcomeTimeFrame = "8 weeks and up to 60 months post procedure."), 
            list(SecondaryOutcomeMeasure = "Adverse events related to device and procedure", 
                SecondaryOutcomeTimeFrame = "safety to be assessed at 8 weeks and up to 60 months post procedure.")))), 
    EligibilityModule = list(EligibilityCriteria = "Inclusion Criteria:\n\nIs male or a non pregnant female aged 18years or older\nBMI = 35kg/m2\nHas chronic leg pain with or without back pain of greater than 6 months duration,which is partially or completely relieved by either sitting or adopting a flexed posture and who are suitable in the clinicians opinion for posterior lumbar surgery\nPre-operative ODI score = 20%\nPre-operative ZCQ Physical Function Domain =2\nPre-operative VAS Leg pain score = 4\nHas completed at least 6 months of conservative treatment without obtaining adequate symptomatic relief or has worsening neurological symptoms.\nHas degenerative changes at 1 or 2 levels confirmed by MRI or CT Myelogram within the last 12 months) with one or more of the following:\nLumbar spinal stenosis with intermittent neurogenic claudication\nDegeneration of the disc (as evidenced by imaging on MRI)\nAnnular thickening\nDegenerative Spondylolisthesis = Meyerding Grade 1\nThickening of ligamentum flavum\n\nExclusion Criteria:\n\nFixed motor deficit\nHas undergone previous lumbar spinal surgery\nIs unwilling or unable to give consent or adhere to the follow up schedule\nHas active infection or metastatic disease\nHas spondylolisthesis > grade 1\nHas neurogenic bladder or bowel disease\nHas a history of Osteopenia and or Osteoporosis. Evaluation of possible Osteopenia and or Osteoporosis will be conducted via a bone density scan prior to randomisation if ANY of the Bone Mass Evaluation criteria is met\nPatients who are not deemed fit for anaesthesia/major surgery due to underlying medical condition", 
        HealthyVolunteers = "No", Gender = "All", MinimumAge = "18 Years", 
        StdAgeList = list(StdAge = c("Adult", "Older Adult"))), 
    ContactsLocationsModule = list(OverallOfficialList = list(
        OverallOfficial = list(list(OverallOfficialName = "Ganesan Baranidharan, Dr", 
            OverallOfficialAffiliation = "Leeds Teaching Hospitals NHS Trust", 
            OverallOfficialRole = "Principal Investigator"))), 
        LocationList = list(Location = list(list(LocationFacility = "Taunton & Somerset NHS Foundation Trust of Musgrove Park Hospital", 
            LocationCity = "Taunton", LocationState = "Somerset", 
            LocationZip = "TA1 5DA", LocationCountry = "United Kingdom"), 
            list(LocationFacility = "The Ipswich Hospital NHS Trust", 
                LocationCity = "Ipswich", LocationState = "Suffolk", 
                LocationZip = "IP4 5PD", LocationCountry = "United Kingdom"), 
            list(LocationFacility = "Pain and Interventional Neuromodulation Research Group, Pain Management Dept, Seacroft Hospital, Leeds Teaching Hospitals NHS Trust", 
                LocationCity = "Leeds", LocationState = "West Yorkshire", 
                LocationZip = "LS14 6UH", LocationCountry = "United Kingdom"), 
            list(LocationFacility = "The Dudley Group NHS Foundation Trust, Russell Hall Hospital", 
                LocationCity = "Birmingham", LocationZip = "DY1 2HQ", 
                LocationCountry = "United Kingdom"))))), DerivedSection = list(
    MiscInfoModule = list(VersionHolder = "February 26, 2021"), 
    ConditionBrowseModule = list(ConditionMeshList = list(ConditionMesh = list(
        list(ConditionMeshId = "D000013130", ConditionMeshTerm = "Spinal Stenosis"), 
        list(ConditionMeshId = "D000055959", ConditionMeshTerm = "Intervertebral Disc Degeneration"), 
        list(ConditionMeshId = "D000013168", ConditionMeshTerm = "Spondylolisthesis"), 
        list(ConditionMeshId = "D000003251", ConditionMeshTerm = "Constriction, Pathologic"))), 
        ConditionAncestorList = list(ConditionAncestor = list(
            list(ConditionAncestorId = "D000020763", ConditionAncestorTerm = "Pathological Conditions, Anatomical"), 
            list(ConditionAncestorId = "D000013122", ConditionAncestorTerm = "Spinal Diseases"), 
            list(ConditionAncestorId = "D000001847", ConditionAncestorTerm = "Bone Diseases"), 
            list(ConditionAncestorId = "D000009140", ConditionAncestorTerm = "Musculoskeletal Diseases"), 
            list(ConditionAncestorId = "D000013169", ConditionAncestorTerm = "Spondylolysis"), 
            list(ConditionAncestorId = "D000055009", ConditionAncestorTerm = "Spondylosis"))), 
        ConditionBrowseLeafList = list(ConditionBrowseLeaf = list(
            list(ConditionBrowseLeafId = "M26992", ConditionBrowseLeafName = "Intervertebral Disc Degeneration", 
                ConditionBrowseLeafAsFound = "Degenerative Disc Disease", 
                ConditionBrowseLeafRelevance = "high"), list(
                ConditionBrowseLeafId = "M14546", ConditionBrowseLeafName = "Spondylolisthesis", 
                ConditionBrowseLeafAsFound = "Spondylolisthesis", 
                ConditionBrowseLeafRelevance = "high"), list(
                ConditionBrowseLeafId = "M14510", ConditionBrowseLeafName = "Spinal Stenosis", 
                ConditionBrowseLeafAsFound = "Spinal Stenosis", 
                ConditionBrowseLeafRelevance = "high"), list(
                ConditionBrowseLeafId = "M5058", ConditionBrowseLeafName = "Constriction, Pathologic", 
                ConditionBrowseLeafAsFound = "Stenosis", ConditionBrowseLeafRelevance = "high"), 
            list(ConditionBrowseLeafId = "M21103", ConditionBrowseLeafName = "Pathological Conditions, Anatomical", 
                ConditionBrowseLeafRelevance = "low"), list(ConditionBrowseLeafId = "M14502", 
                ConditionBrowseLeafName = "Spinal Diseases", 
                ConditionBrowseLeafRelevance = "low"), list(ConditionBrowseLeafId = "M3708", 
                ConditionBrowseLeafName = "Bone Diseases", ConditionBrowseLeafRelevance = "low"), 
            list(ConditionBrowseLeafId = "M10680", ConditionBrowseLeafName = "Musculoskeletal Diseases", 
                ConditionBrowseLeafRelevance = "low"), list(ConditionBrowseLeafId = "M14547", 
                ConditionBrowseLeafName = "Spondylolysis", ConditionBrowseLeafRelevance = "low"), 
            list(ConditionBrowseLeafId = "M26580", ConditionBrowseLeafName = "Spondylosis", 
                ConditionBrowseLeafRelevance = "low"), list(ConditionBrowseLeafId = "T6038", 
                ConditionBrowseLeafName = "Quality of Life", 
                ConditionBrowseLeafRelevance = "low"))), ConditionBrowseBranchList = list(
            ConditionBrowseBranch = list(list(ConditionBrowseBranchAbbrev = "BC05", 
                ConditionBrowseBranchName = "Muscle, Bone, and Cartilage Diseases"), 
                list(ConditionBrowseBranchAbbrev = "All", ConditionBrowseBranchName = "All Conditions"), 
                list(ConditionBrowseBranchAbbrev = "BC23", ConditionBrowseBranchName = "Symptoms and General Pathology"), 
                list(ConditionBrowseBranchAbbrev = "BXM", ConditionBrowseBranchName = "Behaviors and Mental Disorders"))))))))

我已经取得了什么成就

我可以轻松地将一批 JSON 文件读取到列表中,如下所述 (x= vector with paths to the files)

library(parallel)
library(jsonlite) 
    cl <- makeCluster(detectCores() - 1)
    json_list<-parLapply(cl,paths$path,function(x) jsonlite::fromJSON(x))
    stopCluster(cl)

我试过了什么

我尝试了 中的选项,但是,我收到以下错误消息:simplifyDatFrame = Tjsonlite::fromJSON

1: In (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE,  :
  row names were found from a short variable and have been discarded
2: In (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE,  :
  row names were found from a short variable and have been discarded

我尝试了为直接使用 clinicaltrials.gov 的 Web API 生成的嵌套列表提出的解决方案(how-to-get-data-out-of-nested-xml-structure)。

as_tibble(test$FullStudy$Study)
Error: Tibble columns must have compatible sizes.
* Size 2: Column `DerivedSection`.
* Size 11: Column `ProtocolSection`.
i Only values of size one are recycled.

我尝试使用 tidyjson,但是,我无法从我的嵌套列表中获取整洁的 data.frame。

r json xml tidyverse 嵌套列表

评论


答:

1赞 ava 2/28/2021 #1

软件包完美运行tidyjson

直接使用 tidyjson::read_json 读取 JSON 文件以获得正确的格式(以便进一步处理。tbl_json (S3: tbl_json/tbl_df/tbl/data.frame)

#library
library(tidyjson)

# load the JSON file
tidyjson::read_json("NCT0455805.json") -> test

# check the data structure
str(test)
tbl_json [1 x 2] (S3: tbl_json/tbl_df/tbl/data.frame)

# make a tibble
test %>% tidyjson::spread_all()

> # A tibble: 1 x 42   ..JSON document.id FullStudy.Rank FullStudy.Study~ FullStudy.Study~ FullStudy.Study~ FullStudy.Study~
> FullStudy.Study~ FullStudy.Study~ FullStudy.Study~   <chr>       
> <int>          <dbl> <chr>            <chr>            <chr>          
> <chr>            <chr>            <chr>            <chr>            1
> "{\"F~           1         254369 NCT01455805      Minuteman Spina~
> Efficacy and Qu~ October 2020     Active, not rec~ October 13, 2011
> October 18, 2011