DPXone
Lieutenant
- Registriert
- Mai 2009
- Beiträge
- 554
Hallo Zusammen,
spiele ja öfters mit PowerShell herum. Aktuelles Ziel war es, immer die aktuellsten Versionen der Sysinternals Suite Tools auf dem Rechner zu haben (per TaskScheduler).
Falls es nun jemand interessiert, wie man mehrere Dateien im Multithreading-Verfahren von einer Webseite herunterladen kann, kann sich das Script ja mal zu Herzen nehmen oder als Vorlage verwenden.
Man kann sich auch mit $ShowMatchingLinksOnly = $true einfach nur die verfügbaren Dateien anzeigen lassen.
Verbesserungsvorschläge sind auch erwünschenswert.
Wer nur Wissen will, wie Mutlithreading in PS funktioniert, einfach runterscrollen und meine 4 Ausschnitte aus dem Script ansehen (zweiter großer Code-Block).
Info:
Wer nur wissen will, wie man Multithreading in PowerShell realisiert, hier die 4 Teile aus meinem Script:
spiele ja öfters mit PowerShell herum. Aktuelles Ziel war es, immer die aktuellsten Versionen der Sysinternals Suite Tools auf dem Rechner zu haben (per TaskScheduler).
Falls es nun jemand interessiert, wie man mehrere Dateien im Multithreading-Verfahren von einer Webseite herunterladen kann, kann sich das Script ja mal zu Herzen nehmen oder als Vorlage verwenden.
Man kann sich auch mit $ShowMatchingLinksOnly = $true einfach nur die verfügbaren Dateien anzeigen lassen.
Verbesserungsvorschläge sind auch erwünschenswert.
Wer nur Wissen will, wie Mutlithreading in PS funktioniert, einfach runterscrollen und meine 4 Ausschnitte aus dem Script ansehen (zweiter großer Code-Block).
Info:
- Das PSScript funktioniert nur, wenn man es auch als Script abspeichert und erst dann startet.
Das einfache Kopieren und Einfügen in die Shell schlägt fehl, da $PSScriptRoot als Pfadangabe benötigt wird, welche nur existiert, wenn das Script lokal gespeichert ist. - Um das PSScript im TaskScheduler oder per Batch zu starten, hier die Parameter:
Dateiname in diesem Beispiel: $_Sysinternals_Download_Multithreaded.ps1
Code:powershell.exe -File "$_Sysinternals_Download_Multithreaded.ps1" -NonInteractive -NoProfile -WindowStyle Hidden -ExecutionPolicy ByPass
PowerShell:
Clear-Host
#region Input Parameters
$URI = 'https://live.sysinternals.com'
$OutputFolder = join-path $PSScriptRoot 'SysinternalsSuite'
# Define which files to download
$Include = '*.*' #, '*.exe' , '*.hlp' , '*.chm' , '*.txt' , '*.sys' # Do not use '*' -> always with dot '*.*', otherwise you will get subsite links instead of files only
$Exclude = '*.htm*' , '*.php'
# Allow downloading content from other websites as website defined above
$AllowExternalLinks = $false
# Show matching links and don't download anything! (DEBUGGING)
$ShowMatchingLinksOnly = $false
$WebRequestTimeOut = 30 # seconds
$MaxThreads = 20
#endregion
#region Set a sufficient console buffer size to output full length of results
# To determine which size is sufficient you have to add up the chars of the result of the longest line/width + ~ 15
$host.UI.RawUI.BufferSize = [System.Management.Automation.Host.Size]::new(250 , 500)
#endregion
#region Create OutputFolder if necessary
If (-not(Test-Path $OutputFolder)) { New-Item $OutputFolder -ItemType Directory -Force }
#endregion
#region Create runspace pool for Multithreading
$RunspacePool = [runspacefactory]::CreateRunspacePool(1 , $MaxThreads)
$RunspacePool.Open() # Open runspace pool to be able to add runspaces
#endregion
#region Create script block for downloading | Is needed for Multithreading
$ScriptBlock = {
Param (
[string] $Filename ,
[string] $FileURI ,
[int] $WebRequestTimeOut ,
[string] $OutputFilePath
)
Try {
$FileRequest = Invoke-WebRequest $FileURI -outfile $OutputFilePath -TimeoutSec $WebRequestTimeOut -PassThru # Download file
$Status = $FileRequest.StatusDescription # Should always return status OK = status code 200
} Catch {
$Status = $($_.toString() -split "`r`n" -join ' | ') # On error return error message
}
Return (New-Object psobject -Property([ordered] @{ "File$(' '*25)" = $Filename ; "Status$(' '*200)" = $Status }))
# Make sure that the summary of the output chars is less than or equal to the buffer size width + ~15:
# (File = 4 chars) + 25 spaces + (Status = 6 chars) + 200 spaces = 235 width
}
#endregion
#region Regular Expressions
$CreatePattern = { #ScriptBlock for creating pattern
Param ($InputString)
If ($InputString.trim() -in '' , $null) {
# if empty string -> return a never matchable string
' ' * 50
} Else {
($InputString | % { "($($_ -replace '\.' ,'\.' -replace '\*','[^\/\?\&]*')$)" }) -join '|'
}
}
If ($Include.trim() -in '' , $null) { $Include = '*.*' }
$IncludePattern = & $CreatePattern $Include
$ExcludePattern = & $CreatePattern $Exclude
$ProtocolPattern = '(^(http)(s)?|(ftp))(\:\/\/)'
$BaseURIPattern = "($ProtocolPattern)([^\/\?\&]*)(.*)"
$Protocol = [regex]::Matches($URI , $ProtocolPattern).groups[0]
$BaseURI = [regex]::Matches($URI , $BaseURIPattern).groups[7]
#endregion
If ($URI -notmatch $ProtocolPattern) {
write-host 'You must specify a protocol for your URI!' -ForegroundColor Red
Return
}
#region Invoke web request to get the content of the web page
Try {
$WebRequest = Invoke-WebRequest $Uri -TimeoutSec $WebRequestTimeOut
$WebRequest.Dispose() # Dispose invoked web request
} Catch {
Write-Host($_.toString()) -ForegroundColor Red # return error if anything went wrong
Return # return to caller scope
}
#endregion
#region Create runspaces and assign them to the runspace pool
# Create ArrayList (not array) to be able to remove items
[System.Collections.ArrayList] $RunspaceCollection = @()
If ($WebRequest) {
$Links = @($WebRequest.links | ? {($_.href -match $IncludePattern) -and ($_.href -notmatch $ExcludePattern) }) # Get links
If (-not $AllowExternalLinks) { # If AllowExternalLinks = $false -> filter result by only matching href links which resides on the domain
$Links = $Links | ? {($_.href -notmatch $ProtocolPattern) -or ($_.href -match "^$ProtocolPattern($BaseURI).*") }
}
If ($ShowMatchingLinksOnly) { # Show only matching links and return to caller scope
$Links | Select @{ n = 'IsExternal' ; e = {($_.href -match $ProtocolPattern) -and ($_.href -notmatch "^$ProtocolPattern($BaseURI).*") } } , 'innerText' , 'href'
Return # return to caller scope
}
If ($Links.count -gt 0) { # Check if there are any links
Foreach ($Link In $Links) {
$HREF = $Link.HREF
$FileName = [regex]::Match($HREF , $IncludePattern).value
If ($HREF -match $ProtocolPattern) {
# External link
$FileURI = $Link.HREF
} Else {
$FileURI = "$Protocol$BaseURI$HREF"
}
#region Multithreading
$PowerShell = [powershell]::Create() # Create new powershell instance/runspace
$PowerShell.RunSpacePool = $RunspacePool # Assign new instance/runspace to runspace pool
$Parameters = @{ # Define parameters for the runspace
Filename = $FileName
FileURI = $FileURI
WebRequestTimeout = $WebRequestTimeOut
OutputFilePath = Join-Path $OutputFolder $FileName
}
# Add ScriptBlock and parameters to the runspace within the runspace pool
[void] $PowerShell.AddScript($ScriptBlock).AddParameters($Parameters)
$RunspaceCollection+= New-Object PSObject -Property @{
PowerShell = $PowerShell
RunSpace = $PowerShell.BeginInvoke() # Invoke ScriptBlock of the runspace
}
#endregion
}
}
} Else {
Write-Host('Web request didn''t return any content!') -ForegroundColor Red
Return # return to caller scope
}
#endregion
#region Loop through the runspace collection to determine if a runspace is completed and to return the result
While ($RunspaceCollection) { # While collection is not $null/$empty
$RunspacesNotYetReturned= $RunspaceCollection | ? { $_.Runspace.IsCompleted -eq $true } # Filter by already completed runspaces
Foreach ($Runspace In $RunspacesNotYetReturned) {
$Runspace.PowerShell.EndInvoke($Runspace.Runspace) # Return result of the runspace = ScriptBlock result
$Runspace.PowerShell.Dispose() # Dispose runspace
$RunspaceCollection.Remove($Runspace) # Remove runspace from the collection
}
}
#endregion
Wer nur wissen will, wie man Multithreading in PowerShell realisiert, hier die 4 Teile aus meinem Script:
PowerShell:
$MaxThreads = 20
#region Create runspace pool for Multithreading
$RunspacePool = [runspacefactory]::CreateRunspacePool(1 , $MaxThreads)
$RunspacePool.Open() # Open runspace pool to be able to add runspaces
#endregion
#region Create script block for downloading | Is needed for Multithreading => similar to a function
$ScriptBlock = {
Param (
[string] $Filename ,
[string] $FileURI ,
[int] $WebRequestTimeOut ,
[string] $OutputFilePath
)
Try {
$FileRequest = Invoke-WebRequest $FileURI -outfile $OutputFilePath -TimeoutSec $WebRequestTimeOut -PassThru # Download file
$Status = $FileRequest.StatusDescription # Should always return status OK = status code 200
} Catch {
$Status = $($_.toString() -split "`r`n" -join ' | ') # On error return error message
}
Return (New-Object psobject -Property([ordered] @{ "File$(' '*25)" = $Filename ; "Status$(' '*200)" = $Status }))
}
#endregion
# Create ArrayList (not array) to be able to remove items
[System.Collections.ArrayList] $RunspaceCollection = @()
#region Multithreading
$PowerShell = [powershell]::Create() # Create new powershell instance/runspace
$PowerShell.RunSpacePool = $RunspacePool # Assign new instance/runspace to runspace pool
$Parameters = @{ # Define parameters for the runspace
Filename = $FileName
FileURI = $FileURI
WebRequestTimeout = $WebRequestTimeOut
OutputFilePath = Join-Path $OutputFolder $FileName
}
# Add ScriptBlock and parameters to the runspace within the runspace pool
[void] $PowerShell.AddScript($ScriptBlock).AddParameters($Parameters)
$RunspaceCollection+= New-Object PSObject -Property @{
PowerShell = $PowerShell
RunSpace = $PowerShell.BeginInvoke() # Invoke ScriptBlock of the runspace
}
#endregion
#region Loop through the runspace collection to determine if a runspace is completed and to return the result
While ($RunspaceCollection) { # While collection is not $null/$empty
$RunspacesNotYetReturned= $RunspaceCollection | ? { $_.Runspace.IsCompleted -eq $true } # Filter by already completed runspaces
Foreach ($Runspace In $RunspacesNotYetReturned) {
$Runspace.PowerShell.EndInvoke($Runspace.Runspace) # Return result of the runspace = ScriptBlock result
$Runspace.PowerShell.Dispose() # Dispose runspace
$RunspaceCollection.Remove($Runspace) # Remove runspace from the collection
}
}
#endregion
Zuletzt bearbeitet: