Woodstox
  1. Woodstox
  2. WSTX-130

Add optional support for tracking exact location of attribute values (separate from elements)

    Details

    • Type: New Feature New Feature
    • Status: In Progress In Progress
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None
    • Number of attachments :
      9

      Description

      (filed based on request by Elnar Hajiyev)

      It would be nice if there was a way to track location information for individual attributes. Currently all attribute events give location of the enclosing element, since tracking of location information for all attributes would cause additional complexity and overhead, and was assumed not be needed by most users.

      The support can be added via Woodstox-specific property, disabled by default. For Event API location could be passed by Woodstox, but for raw cursor API, a new method may need to be added.
      One open question is what to do with attributes that get default value from DTD (and hence there's no attribute within parsed document itself): one possibility would be to give location where attribute (and its default value) was declared in DTD; another to just use element's location. Latter might be less surprising to users.

      1. AttributeCollector.patch
        2 kB
        Marco Hunsicker
      2. BasicStreamReader.patch
        6 kB
        Marco Hunsicker
      3. BasicStreamReader1.patch
        8 kB
        Marco Hunsicker
      4. CompactNsContext.patch
        3 kB
        Marco Hunsicker
      5. InputElementStack.patch
        2 kB
        Marco Hunsicker
      6. NonNsAttributeCollector.patch
        2 kB
        Marco Hunsicker
      7. NonNsInputElementStack.patch
        1 kB
        Marco Hunsicker
      8. NsAttributeCollector.patch
        4 kB
        Marco Hunsicker
      9. NsInputElementStack.patch
        2 kB
        Marco Hunsicker

        Activity

        Hide
        Tatu Saloranta added a comment -

        Hmmh. Implementing this looks quite hairy, due to optimizations done for the specific code path that'd need to be supported (passing distinct Location objects so Element events can be constructed). It is doable, still. I'll have to keep on thinking of how exactly to get it done.

        Show
        Tatu Saloranta added a comment - Hmmh. Implementing this looks quite hairy, due to optimizations done for the specific code path that'd need to be supported (passing distinct Location objects so Element events can be constructed). It is doable, still. I'll have to keep on thinking of how exactly to get it done.
        Hide
        Marco Hunsicker added a comment -

        I need this information as well for my use case. For now, I will adapt the sources to my needs (and don't care much about efficiency), but I would definitely welcome official support. Thanks.

        Show
        Marco Hunsicker added a comment - I need this information as well for my use case. For now, I will adapt the sources to my needs (and don't care much about efficiency), but I would definitely welcome official support. Thanks.
        Hide
        Tatu Saloranta added a comment -

        Thank you for the comment. I haven't had time to look into this (as it wasn't a core thing for 4.0 release), but given interest I'll try to have another look.

        Show
        Tatu Saloranta added a comment - Thank you for the comment. I haven't had time to look into this (as it wasn't a core thing for 4.0 release), but given interest I'll try to have another look.
        Hide
        Marco Hunsicker added a comment -

        Thanks much for stepping up. Location tracking is obviously one thing that does not seem of interest for many applications, and support in Woodstock is a bit weak in this regard.

        Anyways, I've already started to hack in the missing bits. I don't have much of a clue how the whole thing works, but supporting my use case showed to be straight-forward. Problem is that I've imported and re-packaged Woodstox into my repository which makes diffing/patching a bit difficult. I might find the time next week to apply the changes against trunk. Or I could send you the changed files as a whole. The differences are really minor. If you're interested, just tell me your preference. Thanks.

        Show
        Marco Hunsicker added a comment - Thanks much for stepping up. Location tracking is obviously one thing that does not seem of interest for many applications, and support in Woodstock is a bit weak in this regard. Anyways, I've already started to hack in the missing bits. I don't have much of a clue how the whole thing works, but supporting my use case showed to be straight-forward. Problem is that I've imported and re-packaged Woodstox into my repository which makes diffing/patching a bit difficult. I might find the time next week to apply the changes against trunk. Or I could send you the changed files as a whole. The differences are really minor. If you're interested, just tell me your preference. Thanks.
        Hide
        Tatu Saloranta added a comment -

        Any form of diffs should work fine – I will probably be revising bits anyway. But existing code would make it easier to get started. So if you want to attach something to this report that'd be great.

        Btw, location track is something I actually did spend quite a bit of time trying to get right – it gets rather tricky with DTDs and entity expansion. And I was myself disappointed with what other offerings had.
        But there is generally plenty of friction between accuracy of all (potentially rarely needed) information, and efficiency (memory usage, performance).

        Show
        Tatu Saloranta added a comment - Any form of diffs should work fine – I will probably be revising bits anyway. But existing code would make it easier to get started. So if you want to attach something to this report that'd be great. Btw, location track is something I actually did spend quite a bit of time trying to get right – it gets rather tricky with DTDs and entity expansion. And I was myself disappointed with what other offerings had. But there is generally plenty of friction between accuracy of all (potentially rarely needed) information, and efficiency (memory usage, performance).
        Hide
        Marco Hunsicker added a comment -

        Patches against 4.0.1

        Show
        Marco Hunsicker added a comment - Patches against 4.0.1
        Hide
        Marco Hunsicker added a comment -

        Patches against 4.0.1

        Show
        Marco Hunsicker added a comment - Patches against 4.0.1
        Hide
        Marco Hunsicker added a comment -

        Patches against 4.0.1

        Show
        Marco Hunsicker added a comment - Patches against 4.0.1
        Hide
        Marco Hunsicker added a comment -

        I've added unified diffs for the changes I've applied in my source tree. Package names and import declarations therefore always differ - you can ignore most of these differences.

        You would certainly need to adapt the changes to make things fit as a whole. I always need the position information and did not try to support the corresponding configuration property. For my needs knowing the line is currently sufficient - the column information is off by one. And I'm only using the event API, didn't look into the cursor stuff. If you have any questions, please let me know. Thanks much.

        Show
        Marco Hunsicker added a comment - I've added unified diffs for the changes I've applied in my source tree. Package names and import declarations therefore always differ - you can ignore most of these differences. You would certainly need to adapt the changes to make things fit as a whole. I always need the position information and did not try to support the corresponding configuration property. For my needs knowing the line is currently sufficient - the column information is off by one. And I'm only using the event API, didn't look into the cursor stuff. If you have any questions, please let me know. Thanks much.
        Hide
        Tatu Saloranta added a comment -

        Thank you very much! This should definitely be useful.

        Also: if I end up using the code, I will need you to fill & sign the standard contributor agreeement (it's under DEV/ within svn repository). It's pretty standard, nothing unusual, but thought I'll mention it now.

        Show
        Tatu Saloranta added a comment - Thank you very much! This should definitely be useful. Also: if I end up using the code, I will need you to fill & sign the standard contributor agreeement (it's under DEV/ within svn repository). It's pretty standard, nothing unusual, but thought I'll mention it now.
        Hide
        Marco Hunsicker added a comment -

        Thanks for considering the patch. I will send you the scanned agreement via e-mail.

        Show
        Marco Hunsicker added a comment - Thanks for considering the patch. I will send you the scanned agreement via e-mail.
        Hide
        Marco Hunsicker added a comment -

        Column location was not correct, buggy caching could cause unwanted behavior

        Show
        Marco Hunsicker added a comment - Column location was not correct, buggy caching could cause unwanted behavior
        Hide
        Tatu Saloranta added a comment -

        Cool. Quick question: would it be possible to just get a single unified diff for all the changes? It'd be easier for me to patch the diffs that way.

        Show
        Tatu Saloranta added a comment - Cool. Quick question: would it be possible to just get a single unified diff for all the changes? It'd be easier for me to patch the diffs that way.
        Hide
        Marco Hunsicker added a comment -

        Will take some time. As I've renamed the package structure, all files are different if I would simply diff whole directories.

        Best long term solution should be that I checkout Woodstox from SVN and create a local project where I will mirror my patches.

        For now I could send you the whole files, if that would help you. Thanks.

        Show
        Marco Hunsicker added a comment - Will take some time. As I've renamed the package structure, all files are different if I would simply diff whole directories. Best long term solution should be that I checkout Woodstox from SVN and create a local project where I will mirror my patches. For now I could send you the whole files, if that would help you. Thanks.
        Hide
        Tatu Saloranta added a comment -

        Ok. I think diffs are ok then... as long as there's enough context so I can find the location.

        Show
        Tatu Saloranta added a comment - Ok. I think diffs are ok then... as long as there's enough context so I can find the location.
        Hide
        Tatu Saloranta added a comment -

        Whoa. This sure took longer than I thought... but I managed to re-factor attribute handling stuff in 4.1 branch (merged to trunk == 5.0 too), such that it should be easy/ier to pass in Location efficiently.

        What I will do next is to add a property to specify that per-attribute (and namespace decl) location is to be tracked (disabled by default), and then add support code in BasicStreamReader to get Location as needed. AttributeCollector has to take that in too and pass to Attribute. And when constructing StartElement event, need to pass location. Up to this point things should be simple; and beyond that I don't know yet.

        But I thought I'll add a note so that you can have a look at code in case it might make it bit easier to maintain your local changes?

        Show
        Tatu Saloranta added a comment - Whoa. This sure took longer than I thought... but I managed to re-factor attribute handling stuff in 4.1 branch (merged to trunk == 5.0 too), such that it should be easy/ier to pass in Location efficiently. What I will do next is to add a property to specify that per-attribute (and namespace decl) location is to be tracked (disabled by default), and then add support code in BasicStreamReader to get Location as needed. AttributeCollector has to take that in too and pass to Attribute. And when constructing StartElement event, need to pass location. Up to this point things should be simple; and beyond that I don't know yet. But I thought I'll add a note so that you can have a look at code in case it might make it bit easier to maintain your local changes?

          People

          • Assignee:
            Tatu Saloranta
            Reporter:
            Tatu Saloranta
          • Votes:
            1 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated: