Several changes need to be done manually to the file:
...etc...
, these were copied manually. <div style="clear: both"> <img ...> </div>
[0-9][0-9][0-9][0-9];
- These are usually M$ characters which aren't printed out correctly.
||
- Pier interprets this a centering text, and leaves an empty cell at the end.&#43;
. Exceptions were: <font size=+2>
+http:...Open+%2F...+
Here's the Smalltalk code to fix these. It reads the file wiki.test.st and creates the file test.new.st:
| file1 line outStream shouldConvertPre preFormattedLines updatedPreLines count markupConversions convertOriginalBlock conversionDictionary convertHtmlToPier filename dir | Transcript cr; show: 'Starting at: '; show: Time now. filename := 'test.new.st'. markupConversions := #('==' '''''''''' '""' '^;' '^;' '__' '@@' '\^\^') collect: [:pattern | RxMatcher forString: pattern ignoreCase: false]. convertOriginalBlock := [:string | markupConversions inject: string into: [:str :re || results | results := re copy: str translatingMatchesUsing: [:match | '\', match]]. (RxMatcher forString: '[^#-]--' ignoreCase: false) copy: results translatingMatchesUsing: [:match | match first asString, '\--']]. (conversionDictionary := Dictionary new) at: (RxMatcher forString: '<code>[^<]*</code>' ignoreCase: true) put: #('==' 7 7); at: (RxMatcher forString: '<i>[^<]*</i>' ignoreCase: true) put: #('''''''''' 4 4); at: (RxMatcher forString: '<b>[^<]*</b>' ignoreCase: true) put: #('""' 4 4); at: (RxMatcher forString: '<u>[^<]*</u>' ignoreCase: true) put: #('__' 4 4). convertHtmlToPier := [:string | string isEmpty ifTrue: [string] ifFalse: [ string := ([conversionDictionary keys contains: [:re | re search: string]] whileTrue: [ (conversionDictionary keys select: [:re | re search: string]) do: [:re | string := re copy: string translatingMatchesUsing: [:match || array | array := conversionDictionary at: re. array first, (match copyFrom: array second to: match size - (array third)) , array first]]]) ifNil: [string]. (('*font size=+*' match: string) or: [(RxMatcher forString: 'http[^ ]*\+' ignoreCase: false) search: string]) ifTrue: [string] ifFalse: [string := (RxMatcher forString: '\+' ignoreCase: true) copy: string translatingMatchesUsing: [:p | '+']. string first = $| ifTrue: [string := (RxMatcher forString: '\|\|' ignoreCase: true) copy: string translatingMatchesUsing: [:p | '| |']]. string ]]]. count := 0. line := nil. preFormattedLines := OrderedCollection new. updatedPreLines := OrderedCollection new. outStream := file1 := nil. [ ((dir := FileDirectory default) fileExists: filename) ifTrue: [self halt. outStream := dir oldFileNamed: filename. outStream setToEnd] ifFalse: [outStream := dir newFileNamed: filename]. file1 := FileStream fileNamed: 'wiki.test.st'. "file2 := FileDirectory default newFileNamed: 'test.new.st'." shouldConvertPre := false. [file1 atEnd] whileFalse: [line := file1 nextLine. count := count + 1. (line notEmpty and: [line first = $=]) ifTrue: [shouldConvertPre := shouldConvertPre | (line includesAnyOf: '<'). preFormattedLines add: line. shouldConvertPre ifFalse: [ updatedPreLines add: (line copyWithRegex: '&[^;]*;' matchesTranslatedUsing: [:p || digits | p asUppercase = '<' ifTrue: ['<'] ifFalse: [p asUppercase = '>' ifTrue: ['>'] ifFalse: [(p second = $# and: [(digits := p copyFrom: 3 to: (p size - 1)) isAllDigits and: [digits asInteger < 256]]) ifTrue: [String with: (Character value: digits asInteger)] ifFalse: [shouldConvertPre := true. p]]]])] ] ifFalse: [ preFormattedLines isEmpty ifFalse: [ shouldConvertPre ifTrue: [outStream nextPutAll: '<pre>'. preFormattedLines allButLast do: [:lineStartingWithEquals | outStream nextPutAll: ( convertHtmlToPier value: (convertOriginalBlock value: lineStartingWithEquals copyWithoutFirst)). outStream cr; cr]. (preFormattedLines last matchesRegex: '.*[^'']''.') ifTrue: [outStream nextPutAll: ( convertHtmlToPier value: (convertOriginalBlock value: (preFormattedLines last copyFrom: 2 to: (preFormattedLines last size - 2)))). outStream nextPutAll: '</pre>''.'] ifFalse: [outStream nextPutAll: ( convertHtmlToPier value: (convertOriginalBlock value: preFormattedLines last copyWithoutFirst)). outStream nextPutAll: '</pre>']. outStream cr] ifFalse: [ updatedPreLines do: [:lineStartingWithEquals | outStream nextPutAll: lineStartingWithEquals; cr]]]. preFormattedLines := OrderedCollection new. updatedPreLines := OrderedCollection new. shouldConvertPre := false. outStream nextPutAll: (convertHtmlToPier value: (convertOriginalBlock value: line)); cr] ] ] ensure: [file1 isNil ifFalse: [file1 close]. outStream isNil ifFalse: [outStream close]]. Transcript cr; show: 'Read '; show: count; show: ' lines'. Transcript cr; show: 'Finishing at: '; show: Time now.
After importing, the persistency will have the page changes, but they won't be sorted by date. Use the following to re-sort them:
PRKernel instances anyOne persistency inspect.
In the inspector, do the following:
history := (self history asSortedCollection: [:a :b | a command timestamp < b command timestamp]) asOrderedCollection
When a page is moved, it will not correct its links. Run the following to fix this (taken from PRExportImportWidget>>sanitizeImport
):
PRKernel instances anyOne root enumerator everything do: [:page | page outgoingReferences do: [:link | (link isInternal and: [link isBroken not]) ifTrue: [link update]]]
Its also a good idea to look for any broken links, there should be no more broken links when in the Smallwiki Assessment step.
| brokenLinks | brokenLinks := Dictionary new. PRKernel instances anyOne root enumerator everything do: [:page | page outgoingReferences do: [:link | (link isInternal and: [link isBroken]) ifTrue: [brokenLinks at: link put: page]]]. brokenLinks inspect