Class: CrossOrigen::XMLDoc
- Inherits:
-
Object
- Object
- CrossOrigen::XMLDoc
- Defined in:
- lib/cross_origen/xml_doc.rb
Overview
This is the base class of all doc formats that are XML based
Defined Under Namespace
Classes: CreationInfo, ImportInfo
Constant Summary collapse
- HTML_TRANSFORMS =
These (in many cases illegal) tags will be forced to their valid equivalents These will be executed in the defined order, so for later xfrms you can for example assume that all 'rows' have already been converted to 'tr' valid equivalents
{ 'table/title' => 'caption', 'table//row' => 'tr', 'thead//entry' => 'th', 'table//entry' => 'td', 'td/p' => 'span', 'th/p' => 'span' }
- HTML_TRANSFORMER =
This can be used to perform additional by-node transformation if required, normally this should be used if transform of a node attribute is required
lambda do |env| if env[:node_name] == 'td' || env[:node_name] == 'th' if env[:node].attr('nameend') first = env[:node].attr('namest').sub('col', '').to_i last = env[:node].attr('nameend').sub('col', '').to_i env[:node].set_attribute('colspan', (last - first + 1).to_s) end end end
- HTML_SANITIZATION_CONFIG =
Defines the rules for sanitization of any HTML strings that will be converted to markdown for representation within Origen
{ # Only these tags will be allowed through, everything else will be stripped # Note that this is applied after the transforms listed above elements: %w(b em i strong u p ul ol li table tr td th tbody thead), attributes: { 'td' => ['colspan'], 'th' => ['colspan'] }, # Not planning to allow any of these right now, but keeping around # as an example of how to do so #:protocols => { # 'a' => {'href' => ['http', 'https', 'mailto']} # } transformers: HTML_TRANSFORMER }
Instance Attribute Summary collapse
-
#creation_info ⇒ Object
Returns the value of attribute creation_info.
-
#import_info ⇒ Object
Returns the value of attribute import_info.
-
#owner ⇒ Object
readonly
Returns the object that included the CrossOrigen module.
Instance Method Summary collapse
-
#doc(path, options = {}) ⇒ Object
This returns the doc wrapped by a Nokogiri doc.
- #extract(element, path, options = {}) ⇒ Object
-
#fetch(xml, options = {}) ⇒ Object
fetch an XML snippet passed and extract and format the data.
-
#initialize(owner) ⇒ XMLDoc
constructor
A new instance of XMLDoc.
-
#pre_sanitize(html) ⇒ Object
Freescale register descriptions are like the wild west, need to do some pre-screening to approach valid HTML before handing off to other off the shelf sanitizers.
-
#to_html(string, _options = {}) ⇒ Object
Convert the given markdown string to HTML.
-
#to_markdown(html, _options = {}) ⇒ Object
Does its best to convert the given html fragment to markdown.
-
#try(*methods) ⇒ Object
Tries the given methods on the owner and returns the first one to return a value, ultimately returns nil if no value is found.
Constructor Details
#initialize(owner) ⇒ XMLDoc
Returns a new instance of XMLDoc.
60 61 62 63 64 |
# File 'lib/cross_origen/xml_doc.rb', line 60 def initialize(owner) @owner = owner @creation_info = CreationInfo.new @import_info = ImportInfo.new end |
Instance Attribute Details
#creation_info ⇒ Object
Returns the value of attribute creation_info.
12 13 14 |
# File 'lib/cross_origen/xml_doc.rb', line 12 def creation_info @creation_info end |
#import_info ⇒ Object
Returns the value of attribute import_info.
12 13 14 |
# File 'lib/cross_origen/xml_doc.rb', line 12 def import_info @import_info end |
#owner ⇒ Object (readonly)
Returns the object that included the CrossOrigen module
58 59 60 |
# File 'lib/cross_origen/xml_doc.rb', line 58 def owner @owner end |
Instance Method Details
#doc(path, options = {}) ⇒ Object
This returns the doc wrapped by a Nokogiri doc
86 87 88 89 90 91 92 |
# File 'lib/cross_origen/xml_doc.rb', line 86 def doc(path, = {}) require 'nokogiri' File.open(path) do |f| yield Nokogiri::XML(f) end end |
#extract(element, path, options = {}) ⇒ Object
94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 |
# File 'lib/cross_origen/xml_doc.rb', line 94 def extract(element, path, = {}) = { format: :string, hex: false, default: nil, downcase: false, return: :text, # A value or array or values which are considered to be nil, if this is the value # to be returned then nil will be returned instead nil_on: false }.merge() node = element.at_xpath(path) if node if [:format] == :string str = node.send([:return]).strip str = str.downcase if [:downcase] if [:nil_on] && [[:nil_on]].flatten.include?(str) nil else str end elsif [:format] == :integer val = node.send([:return]) if val =~ /^0x(.*)/ Regexp.last_match[1].to_i(16) elsif [:hex] val.to_i(16) else val.to_i(10) end else fail "Unknown format: #{[:format]}" end else [:default] end end |
#fetch(xml, options = {}) ⇒ Object
fetch an XML snippet passed and extract and format the data
166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 |
# File 'lib/cross_origen/xml_doc.rb', line 166 def fetch(xml, = {}) = { type: String, downcase: false, symbolize: false, strip: false, squeeze: false, squeeze_lines: false, rm_specials: false, whitespace: false, get_text: false, to_i: false, to_html: false, to_bool: false, children: false, to_dec: false, to_f: false, underscore: false }.update() [:symbolize] = [:to_sym] if [:to_sym] # Check for incompatible options xml_orig = xml numeric_methods = [:to_i, :to_f, :to_dec] if [:get_text] == true && [:to_html] == true fail 'Cannot use :get_text and :to_html options at the same time, exiting...' end if [:symbolize] == true fail 'Cannot convert to a number of any type and symbolize at the same time' if numeric_methods.reject { |arg| [arg] == true }.size < 3 end fail 'Cannot select multiple numeric conversion args at the same time' if numeric_methods.reject { |arg| [arg] == true }.size < 2 if xml.nil? Origen.log.debug 'XML data is nil!' return nil end xml = xml.text if [:get_text] == true # Sometimes XML snippets get sent as nodes or as Strings # Must skip this code if a String as it is designed to change # the XML node into a string unless xml.is_a? String if [:to_html] == true if xml.children # If there are children to this XMl node then grab the content there if xml.children.empty? || [:children] == false xml = xml.to_html else xml = xml.children.to_html end end end end unless xml.is_a? [:type] Origen.log.debug "XML data is not of correct type '#{[:type]}'" Origen.log.debug "xml is \n#{xml}" return nil end if [:type] == String if xml.match(/\s+/) && [:whitespace] == false Origen.log.debug "XML data '#{xml}' cannot have white space" return nil end xml.downcase! if [:downcase] == true xml = xml.underscore if [:underscore] == true xml.strip! if [:strip] == true xml.squeeze!(' ') if [:squeeze] == true xml = xml.squeeze_lines if [:squeeze_lines] == true xml.gsub!(/[^0-9A-Za-z]/, '_') if [:rm_specials] == true if [:symbolize] == true return xml.to_sym elsif [:to_i] == true return xml.to_i elsif [:to_dec] == true return xml.to_dec elsif [:to_f] == true return xml.to_f elsif [true, false].include?(xml.to_bool) && [:to_bool] == true # If the string can convert to Boolean then return TrueClass or FalseClass return xml.to_bool else return xml end else # No real examples yet of non-string content return xml end end |
#pre_sanitize(html) ⇒ Object
Freescale register descriptions are like the wild west, need to do some pre-screening to approach valid HTML before handing off to other off the shelf sanitizers
134 135 136 137 138 139 140 |
# File 'lib/cross_origen/xml_doc.rb', line 134 def pre_sanitize(html) html = Nokogiri::HTML.fragment(html) HTML_TRANSFORMS.each do |orig, new| html.xpath(".//#{orig}").each { |node| node.name = new } end html.to_html end |
#to_html(string, _options = {}) ⇒ Object
Convert the given markdown string to HTML
157 158 159 160 161 162 163 |
# File 'lib/cross_origen/xml_doc.rb', line 157 def to_html(string, = {}) # Escape any " that are not already escaped string.gsub!(/([^\\])"/, '\1\"') # Escape any ' that are not already escaped string.gsub!(/([^\\])'/, %q(\1\\\')) html = Kramdown::Document.new(string, input: :kramdown).to_html end |
#to_markdown(html, _options = {}) ⇒ Object
Does its best to convert the given html fragment to markdown
The final markdown may still contain some HTML tags, but any weird markup which may break a future markdown -> html conversion will be removed
147 148 149 150 151 152 153 154 |
# File 'lib/cross_origen/xml_doc.rb', line 147 def to_markdown(html, = {}) cleaned = html.scrub cleaned = pre_sanitize(cleaned) cleaned = Sanitize.fragment(cleaned, HTML_SANITIZATION_CONFIG) Kramdown::Document.new(cleaned, input: :html).to_kramdown.strip rescue 'The description could not be imported, the most likely cause of this is that it contained illegal HTML markup' end |
#try(*methods) ⇒ Object
Tries the given methods on the owner and returns the first one to return a value, ultimately returns nil if no value is found.
To test an object other than the owner pass it as the first argument.
70 71 72 73 74 75 76 77 78 79 80 81 82 83 |
# File 'lib/cross_origen/xml_doc.rb', line 70 def try(*methods) if methods.first.is_a?(Symbol) obj = owner else obj = methods.shift end methods.each do |method| if obj.respond_to?(method) val = obj.send(method) return val if val end end nil end |