0

I have 50K xml files (zml technically, but behave the same), each with metadata for a single file. I need to combine that metadata into a single SQL table so I can manipulate it there. The xml files have varying metadata tags, with some overlapping and some not. How can I get from 50K xml to a single SQL table?

I've looked at Powershell and can get part of the data from a single file parsed out, but haven't figured out writing it into a csv yet.

I've looked at XML data maps in Excel but it only seems to allow the import of a single xml file (1 row of data essentially).

I've also used a co-worker's script to import a single file's worth into SQL, but it puts the whole SQL into a single field, and would then need to be parsed out.

Looking for the best, most efficient path forward, or other ideas.

I have also considered combining all the individual xml files first, but there isn't anything in the file that would identify which file its from - so no way to link it back to the file its describing.

4
  • 1
    You can use XML data type
    – LMC
    Commented Feb 24, 2023 at 15:58
  • What about Excel: Data, Get Data, From File, From Folder
    – jkpieterse
    Commented Feb 24, 2023 at 15:59
  • How about parsing xml in python or nodejs?
    – Mislah
    Commented Feb 24, 2023 at 16:02
  • Do you need to load XML files into a db table (1) as-is, or (2) shred the XML into a rectangular format? Commented Feb 24, 2023 at 16:30

1 Answer 1

0

Figured it out! Used powershell to parse out each section.

##File with list of paths to parse

$CSV = Import-Csv "C:\Folder\AllZMLFiles.csv"

foreach($LINE in $CSV)
    {
    $Path = $LINE.filpath


 [xml]$xmlFile = Get-Content -Path $Path

 ## For each XML section, list out all objects
$xmlFile.document.File |  ForEach-Object {

     ## output the result object
     [pscustomobject]@{
        Source = $Path
         DateCreated = $_.DateCreated
         DateModified = $_.DateModified
         JustUploaded = $_.JustUploaded
         OrigFilename = $_.OrigFilename
     }
 } | Export-Csv -Path C:\Folder\File.csv -NoTypeInformation -Append ##Append to CSV


 [xml]$xmlFile = Get-Content -Path $Path


$xmlFile.document.Fields |  ForEach-Object {

     ## output the result object
     [pscustomobject]@{
        Source = $Path
         Producer = $_.Producer
         ProcessDate = $_.ProcessDate
         Version = $_.Version
         OcrEngine = $_.OcrEngine
         secure = $_.secure
         LABEL = $_.LABEL
         Numero_Contrat = $_.Numero_Contrat
         Date_du_contrat = $_.Date_du_contrat
         Fin_du_contrat = $_.Fin_du_contrat
         Duree = $_.Duree
         Type_Contrat = $_.Type_Contrat
         Artiste = $_.Artiste
         Nom_civil_artiste = $_.Nom_civil_artiste
         Contractant = $_.Contractant
         Type_Document = $_.Type_Document
         Objet = $_.Objet
         Titre_phono_video = $_.Titre_phono_video
         Commentaires = $_.Commentaires
         Statut = $_.Statut
         Conteneur = $_.Conteneur
         Boite = $_.Boite
         Origine = $_.Origine
         OrigFilename = $_.OrigFilename
         Type = $_.Type
     }
 } | Export-Csv -Path C:\Folder\Fields.csv -NoTypeInformation -Append


 [xml]$xmldtInfos = Get-Content -Path $Path


$xmldtInfos.document.dtInfos |  ForEach-Object {

     ## output the result object
     [pscustomobject]@{
        Source = $Path
         ID = $_.ID
         TypeID = $_.TypeID
         WordCount = $_.WordCount
      
     }
 } | Export-Csv -Path C:\ACG\Zedoc\dtInfos.csv -NoTypeInformation -Append

    }

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.