[PowerShell] Mehrere/Alle Dateien "multithreaded" von einer Webseite herunterladen

DPXone

Lieutenant
Registriert
Mai 2009
Beiträge
554
Hallo Zusammen,

spiele ja öfters mit PowerShell herum. Aktuelles Ziel war es, immer die aktuellsten Versionen der Sysinternals Suite Tools auf dem Rechner zu haben (per TaskScheduler).
Falls es nun jemand interessiert, wie man mehrere Dateien im Multithreading-Verfahren von einer Webseite herunterladen kann, kann sich das Script ja mal zu Herzen nehmen oder als Vorlage verwenden.

Man kann sich auch mit $ShowMatchingLinksOnly = $true einfach nur die verfügbaren Dateien anzeigen lassen.

Verbesserungsvorschläge sind auch erwünschenswert. :)

Wer nur Wissen will, wie Mutlithreading in PS funktioniert, einfach runterscrollen und meine 4 Ausschnitte aus dem Script ansehen (zweiter großer Code-Block).

Info:
  1. Das PSScript funktioniert nur, wenn man es auch als Script abspeichert und erst dann startet.
    Das einfache Kopieren und Einfügen in die Shell schlägt fehl, da $PSScriptRoot als Pfadangabe benötigt wird, welche nur existiert, wenn das Script lokal gespeichert ist.
  2. Um das PSScript im TaskScheduler oder per Batch zu starten, hier die Parameter:
    Dateiname in diesem Beispiel: $_Sysinternals_Download_Multithreaded.ps1
    Code:
    powershell.exe -File "$_Sysinternals_Download_Multithreaded.ps1" -NonInteractive -NoProfile -WindowStyle Hidden -ExecutionPolicy ByPass
PowerShell:
Clear-Host

#region Input Parameters                  
$URI = 'https://live.sysinternals.com'
$OutputFolder = join-path $PSScriptRoot 'SysinternalsSuite'

# Define which files to download        
$Include = '*.*' #, '*.exe' , '*.hlp' , '*.chm' , '*.txt' , '*.sys'      # Do not use '*'  ->  always with dot '*.*', otherwise you will get subsite links instead of files only   
$Exclude = '*.htm*' , '*.php'         

# Allow downloading content from other websites as website defined above          
$AllowExternalLinks = $false

# Show matching links and don't download anything! (DEBUGGING)        
$ShowMatchingLinksOnly = $false

$WebRequestTimeOut = 30 # seconds                   
$MaxThreads = 20
#endregion                   


#region Set a sufficient console buffer size to output full length of results                   
# To determine which size is sufficient you have to add up the chars of the result of the longest line/width + ~ 15                         
$host.UI.RawUI.BufferSize = [System.Management.Automation.Host.Size]::new(250 , 500)
#endregion                            


#region Create OutputFolder if necessary                   
If (-not(Test-Path $OutputFolder)) { New-Item $OutputFolder -ItemType Directory -Force }
#endregion                        


#region Create runspace pool for Multithreading                
$RunspacePool = [runspacefactory]::CreateRunspacePool(1 , $MaxThreads)
$RunspacePool.Open() # Open runspace pool to be able to add runspaces                   
#endregion                                                   


#region Create script block for downloading | Is needed for Multithreading                
$ScriptBlock = {
    Param (
        [string] $Filename ,
        [string] $FileURI ,
        [int] $WebRequestTimeOut ,
        [string] $OutputFilePath
    )
 
    Try {
        $FileRequest = Invoke-WebRequest $FileURI -outfile $OutputFilePath -TimeoutSec $WebRequestTimeOut -PassThru # Download file                   
        $Status = $FileRequest.StatusDescription # Should always return status OK = status code 200                   
    } Catch {
        $Status = $($_.toString() -split "`r`n" -join ' | ') # On error return error message                   
    }
                          
    Return (New-Object psobject -Property([ordered] @{ "File$(' '*25)" = $Filename ; "Status$(' '*200)" = $Status }))
    # Make sure that the summary of the output chars is less than or equal to the buffer size width + ~15:                      
    # (File = 4 chars) + 25 spaces + (Status = 6 chars) + 200 spaces = 235 width                        
}
#endregion                            


#region Regular Expressions                 
$CreatePattern = { #ScriptBlock for creating pattern    
    Param ($InputString)
    If ($InputString.trim() -in '' , $null) {
        # if empty string -> return a never matchable string
        ' ' * 50
    } Else {
        ($InputString | % { "($($_ -replace '\.' ,'\.' -replace '\*','[^\/\?\&]*')$)" }) -join '|'
    }
}

If ($Include.trim() -in '' , $null) { $Include = '*.*' }
$IncludePattern = & $CreatePattern $Include
$ExcludePattern = & $CreatePattern $Exclude

$ProtocolPattern = '(^(http)(s)?|(ftp))(\:\/\/)'
$BaseURIPattern = "($ProtocolPattern)([^\/\?\&]*)(.*)"

$Protocol = [regex]::Matches($URI , $ProtocolPattern).groups[0]
$BaseURI = [regex]::Matches($URI , $BaseURIPattern).groups[7]
#endregion                   


If ($URI -notmatch $ProtocolPattern) {
    write-host 'You must specify a protocol for your URI!' -ForegroundColor Red
    Return
}


#region Invoke web request to get the content of the web page                   
Try {
    $WebRequest = Invoke-WebRequest $Uri -TimeoutSec $WebRequestTimeOut
    $WebRequest.Dispose() # Dispose invoked web request                   
} Catch {
    Write-Host($_.toString()) -ForegroundColor Red # return error if anything went wrong                   
    Return # return to caller scope                   
}
#endregion                    


#region Create runspaces and assign them to the runspace pool
 
# Create ArrayList (not array) to be able to remove items                   
[System.Collections.ArrayList] $RunspaceCollection = @()

If ($WebRequest) {
    $Links = @($WebRequest.links | ? {($_.href -match $IncludePattern) -and ($_.href -notmatch $ExcludePattern) }) # Get links                       
 
    If (-not $AllowExternalLinks) { # If AllowExternalLinks = $false -> filter result by only matching href links which resides on the domain  
        $Links = $Links | ? {($_.href -notmatch $ProtocolPattern) -or ($_.href -match "^$ProtocolPattern($BaseURI).*") }
    }
 
    If ($ShowMatchingLinksOnly) { # Show only matching links and return to caller scope 
        $Links | Select @{ n = 'IsExternal' ; e = {($_.href -match $ProtocolPattern) -and ($_.href -notmatch "^$ProtocolPattern($BaseURI).*") } } , 'innerText' , 'href'
        Return # return to caller scope 
    }
 
    If ($Links.count -gt 0) { # Check if there are any links                       
        Foreach ($Link In $Links) {
            $HREF = $Link.HREF
            $FileName = [regex]::Match($HREF , $IncludePattern).value
         
            If ($HREF -match $ProtocolPattern) {
                # External link  
                $FileURI = $Link.HREF
            } Else {
                $FileURI = "$Protocol$BaseURI$HREF"
            }
         
            #region Multithreading
            $PowerShell = [powershell]::Create() # Create new powershell instance/runspace                       
            $PowerShell.RunSpacePool = $RunspacePool # Assign new instance/runspace to runspace pool                       
         
            $Parameters = @{ # Define parameters for the runspace          
                Filename = $FileName
                FileURI = $FileURI
                WebRequestTimeout = $WebRequestTimeOut
                OutputFilePath = Join-Path $OutputFolder $FileName
            }
         
            # Add ScriptBlock and parameters to the runspace within the runspace pool                   
            [void] $PowerShell.AddScript($ScriptBlock).AddParameters($Parameters)
         
            $RunspaceCollection+= New-Object PSObject -Property @{
                PowerShell = $PowerShell
                RunSpace = $PowerShell.BeginInvoke() # Invoke ScriptBlock of the runspace                       
            }
            #endregion
        }
    }
} Else {
    Write-Host('Web request didn''t return any content!') -ForegroundColor Red
    Return # return to caller scope                   
}
#endregion                   


#region Loop through the runspace collection to determine if a runspace is completed and to return the result                   
While ($RunspaceCollection) { # While collection is not $null/$empty                       
    $RunspacesNotYetReturned= $RunspaceCollection | ? { $_.Runspace.IsCompleted -eq $true } # Filter by already completed runspaces                       
 
    Foreach ($Runspace In $RunspacesNotYetReturned) {
        $Runspace.PowerShell.EndInvoke($Runspace.Runspace) # Return result of the runspace = ScriptBlock result                           
        $Runspace.PowerShell.Dispose() # Dispose runspace                       
        $RunspaceCollection.Remove($Runspace) # Remove runspace from the collection                       
    }
}
#endregion



Wer nur wissen will, wie man Multithreading in PowerShell realisiert, hier die 4 Teile aus meinem Script:
PowerShell:
$MaxThreads = 20

#region Create runspace pool for Multithreading   
$RunspacePool = [runspacefactory]::CreateRunspacePool(1 , $MaxThreads)
$RunspacePool.Open() # Open runspace pool to be able to add runspaces                    
#endregion                                                    


#region Create script block for downloading | Is needed for Multithreading  => similar to a function                 
$ScriptBlock = {
    Param (
        [string] $Filename ,
        [string] $FileURI ,
        [int] $WebRequestTimeOut ,
        [string] $OutputFilePath
    )
 
    Try {
        $FileRequest = Invoke-WebRequest $FileURI -outfile $OutputFilePath -TimeoutSec $WebRequestTimeOut -PassThru # Download file                    
        $Status = $FileRequest.StatusDescription # Should always return status OK = status code 200                    
    } Catch {
        $Status = $($_.toString() -split "`r`n" -join ' | ') # On error return error message                    
    }
                          
    Return (New-Object psobject -Property([ordered] @{ "File$(' '*25)" = $Filename ; "Status$(' '*200)" = $Status }))                  
}
#endregion                  

# Create ArrayList (not array) to be able to remove items                    
[System.Collections.ArrayList] $RunspaceCollection = @()

#region Multithreading
$PowerShell = [powershell]::Create() # Create new powershell instance/runspace                        
$PowerShell.RunSpacePool = $RunspacePool # Assign new instance/runspace to runspace pool                        

$Parameters = @{ # Define parameters for the runspace           
    Filename = $FileName
    FileURI = $FileURI
    WebRequestTimeout = $WebRequestTimeOut
    OutputFilePath = Join-Path $OutputFolder $FileName
}

# Add ScriptBlock and parameters to the runspace within the runspace pool                    
[void] $PowerShell.AddScript($ScriptBlock).AddParameters($Parameters)

$RunspaceCollection+= New-Object PSObject -Property @{
    PowerShell = $PowerShell
    RunSpace = $PowerShell.BeginInvoke() # Invoke ScriptBlock of the runspace                        
}
#endregion


#region Loop through the runspace collection to determine if a runspace is completed and to return the result                    
While ($RunspaceCollection) { # While collection is not $null/$empty                        
    $RunspacesNotYetReturned= $RunspaceCollection | ? { $_.Runspace.IsCompleted -eq $true } # Filter by already completed runspaces                        
 
    Foreach ($Runspace In $RunspacesNotYetReturned) {
        $Runspace.PowerShell.EndInvoke($Runspace.Runspace) # Return result of the runspace = ScriptBlock result                            
        $Runspace.PowerShell.Dispose() # Dispose runspace                        
        $RunspaceCollection.Remove($Runspace) # Remove runspace from the collection                        
    }
}
#endregion
 
Zuletzt bearbeitet:
Für die Aufgabe hättest du auch WSCC nehmen können. Arbeitet mit nirsoft und sysinternals.
 
Yuuri schrieb:
Für die Aufgabe hättest du auch WSCC nehmen können. Arbeitet mit nirsoft und sysinternals.

Ja das Tool kenne ich und hatte ich auch schon im Einsatz, aber wie gesagt, ich spiel nun mal öfters gern rum und bau mir mein Zeug gern selbst zusammen, um mich eigenständig weiter zu entwickeln und schon mal Code-Fragmente für zukünftige Projekte parat zu
haben 😉
 
Zurück
Oben