Class: CrossOrigen::XMLDoc

Inherits:
Object
  • Object
show all
Defined in:
lib/cross_origen/xml_doc.rb

Overview

This is the base class of all doc formats that are XML based

Direct Known Subclasses

CMSISSVD, IpXact

Defined Under Namespace

Classes: CreationInfo, ImportInfo

Constant Summary collapse

HTML_TRANSFORMS =

These (in many cases illegal) tags will be forced to their valid equivalents These will be executed in the defined order, so for later xfrms you can for example assume that all 'rows' have already been converted to 'tr' valid equivalents

{
  'table/title'  => 'caption',
  'table//row'   => 'tr',
  'thead//entry' => 'th',
  'table//entry' => 'td',
  'td/p'         => 'span',
  'th/p'         => 'span'
}
HTML_TRANSFORMER =

This can be used to perform additional by-node transformation if required, normally this should be used if transform of a node attribute is required

lambda do |env|
  if env[:node_name] == 'td' || env[:node_name] == 'th'
    if env[:node].attr('nameend')
      first = env[:node].attr('namest').sub('col', '').to_i
      last = env[:node].attr('nameend').sub('col', '').to_i
      env[:node].set_attribute('colspan', (last - first + 1).to_s)
    end
  end
end
HTML_SANITIZATION_CONFIG =

Defines the rules for sanitization of any HTML strings that will be converted to markdown for representation within Origen

{
  # Only these tags will be allowed through, everything else will be stripped
  # Note that this is applied after the transforms listed above
  elements:     %w(b em i strong u p ul ol li table tr td th tbody thead),
  attributes:   {
    'td' => ['colspan'],
    'th' => ['colspan']
  },
  # Not planning to allow any of these right now, but keeping around
  # as an example of how to do so
  #:protocols => {
  #  'a' => {'href' => ['http', 'https', 'mailto']}
  # }
  transformers: HTML_TRANSFORMER
}

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(owner) ⇒ XMLDoc

Returns a new instance of XMLDoc.



60
61
62
63
64
# File 'lib/cross_origen/xml_doc.rb', line 60

def initialize(owner)
  @owner = owner
  @creation_info = CreationInfo.new
  @import_info = ImportInfo.new
end

Instance Attribute Details

#creation_infoObject

Returns the value of attribute creation_info.



12
13
14
# File 'lib/cross_origen/xml_doc.rb', line 12

def creation_info
  @creation_info
end

#import_infoObject

Returns the value of attribute import_info.



12
13
14
# File 'lib/cross_origen/xml_doc.rb', line 12

def import_info
  @import_info
end

#ownerObject (readonly)

Returns the object that included the CrossOrigen module



58
59
60
# File 'lib/cross_origen/xml_doc.rb', line 58

def owner
  @owner
end

Instance Method Details

#doc(path, options = {}) ⇒ Object

This returns the doc wrapped by a Nokogiri doc



86
87
88
89
90
91
92
# File 'lib/cross_origen/xml_doc.rb', line 86

def doc(path, options = {})
  require 'nokogiri'

  File.open(path) do |f|
    yield Nokogiri::XML(f)
  end
end

#extract(element, path, options = {}) ⇒ Object



94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
# File 'lib/cross_origen/xml_doc.rb', line 94

def extract(element, path, options = {})
  options = {
    format:   :string,
    hex:      false,
    default:  nil,
    downcase: false,
    return:   :text,
    # A value or array or values which are considered to be nil, if this is the value
    # to be returned then nil will be returned instead
    nil_on:   false
  }.merge(options)
  node = element.at_xpath(path)
  if node
    if options[:format] == :string
      str = node.send(options[:return]).strip
      str = str.downcase if options[:downcase]
      if options[:nil_on] && [options[:nil_on]].flatten.include?(str)
        nil
      else
        str
      end
    elsif options[:format] == :integer
      val = node.send(options[:return])
      if val =~ /^0x(.*)/
        Regexp.last_match[1].to_i(16)
      elsif options[:hex]
        val.to_i(16)
      else
        val.to_i(10)
      end
    else
      fail "Unknown format: #{options[:format]}"
    end
  else
    options[:default]
  end
end

#fetch(xml, options = {}) ⇒ Object

fetch an XML snippet passed and extract and format the data



166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
# File 'lib/cross_origen/xml_doc.rb', line 166

def fetch(xml, options = {})
  options = {
    type:          String,
    downcase:      false,
    symbolize:     false,
    strip:         false,
    squeeze:       false,
    squeeze_lines: false,
    rm_specials:   false,
    whitespace:    false,
    get_text:      false,
    to_i:          false,
    to_html:       false,
    to_bool:       false,
    children:      false,
    to_dec:        false,
    to_f:          false,
    underscore:    false
  }.update(options)
  options[:symbolize] = options[:to_sym] if options[:to_sym]
  # Check for incompatible options
  xml_orig = xml
  numeric_methods = [:to_i, :to_f, :to_dec]
  if options[:get_text] == true && options[:to_html] == true
    fail 'Cannot use :get_text and :to_html options at the same time, exiting...'
  end
  if options[:symbolize] == true
    fail 'Cannot convert to a number of any type and symbolize at the same time' if numeric_methods.reject { |arg| options[arg] == true }.size < 3
  end
  fail 'Cannot select multiple numeric conversion args at the same time' if numeric_methods.reject { |arg| options[arg] == true }.size < 2
  if xml.nil?
    Origen.log.debug 'XML data is nil!'
    return nil
  end
  xml = xml.text if options[:get_text] == true
  # Sometimes XML snippets get sent as nodes or as Strings
  # Must skip this code if a String as it is designed to change
  # the XML node into a string
  unless xml.is_a? String
    if options[:to_html] == true
      if xml.children
        # If there are children to this XMl node then grab the content there
        if xml.children.empty? || options[:children] == false
          xml = xml.to_html
        else
          xml = xml.children.to_html
        end
      end
    end
  end
  unless xml.is_a? options[:type]
    Origen.log.debug "XML data is not of correct type '#{options[:type]}'"
    Origen.log.debug "xml is \n#{xml}"
    return nil
  end
  if options[:type] == String
    if xml.match(/\s+/) && options[:whitespace] == false
      Origen.log.debug "XML data '#{xml}' cannot have white space"
      return nil
    end
    xml.downcase! if options[:downcase] == true
    xml = xml.underscore if options[:underscore] == true
    xml.strip! if options[:strip] == true
    xml.squeeze!(' ') if options[:squeeze] == true
    xml = xml.squeeze_lines if options[:squeeze_lines] == true
    xml.gsub!(/[^0-9A-Za-z]/, '_') if options[:rm_specials] == true
    if options[:symbolize] == true
      return xml.to_sym
    elsif options[:to_i] == true
      return xml.to_i
    elsif options[:to_dec] == true
      return xml.to_dec
    elsif options[:to_f] == true
      return xml.to_f
    elsif [true, false].include?(xml.to_bool) && options[:to_bool] == true
      # If the string can convert to Boolean then return TrueClass or FalseClass
      return xml.to_bool
    else
      return xml
    end
  else
    # No real examples yet of non-string content
    return xml
  end
end

#pre_sanitize(html) ⇒ Object

Freescale register descriptions are like the wild west, need to do some pre-screening to approach valid HTML before handing off to other off the shelf sanitizers



134
135
136
137
138
139
140
# File 'lib/cross_origen/xml_doc.rb', line 134

def pre_sanitize(html)
  html = Nokogiri::HTML.fragment(html)
  HTML_TRANSFORMS.each do |orig, new|
    html.xpath(".//#{orig}").each { |node| node.name = new }
  end
  html.to_html
end

#to_html(string, _options = {}) ⇒ Object

Convert the given markdown string to HTML



157
158
159
160
161
162
163
# File 'lib/cross_origen/xml_doc.rb', line 157

def to_html(string, _options = {})
  # Escape any " that are not already escaped
  string.gsub!(/([^\\])"/, '\1\"')
  # Escape any ' that are not already escaped
  string.gsub!(/([^\\])'/, %q(\1\\\'))
  html = Kramdown::Document.new(string, input: :kramdown).to_html
end

#to_markdown(html, _options = {}) ⇒ Object

Does its best to convert the given html fragment to markdown

The final markdown may still contain some HTML tags, but any weird markup which may break a future markdown -> html conversion will be removed



147
148
149
150
151
152
153
154
# File 'lib/cross_origen/xml_doc.rb', line 147

def to_markdown(html, _options = {})
  cleaned = html.scrub
  cleaned = pre_sanitize(cleaned)
  cleaned = Sanitize.fragment(cleaned, HTML_SANITIZATION_CONFIG)
  Kramdown::Document.new(cleaned, input: :html).to_kramdown.strip
rescue
  'The description could not be imported, the most likely cause of this is that it contained illegal HTML markup'
end

#try(*methods) ⇒ Object

Tries the given methods on the owner and returns the first one to return a value, ultimately returns nil if no value is found.

To test an object other than the owner pass it as the first argument.



70
71
72
73
74
75
76
77
78
79
80
81
82
83
# File 'lib/cross_origen/xml_doc.rb', line 70

def try(*methods)
  if methods.first.is_a?(Symbol)
    obj = owner
  else
    obj = methods.shift
  end
  methods.each do |method|
    if obj.respond_to?(method)
      val = obj.send(method)
      return val if val
    end
  end
  nil
end