profile
viewpoint
Matthew Walker mattgwwalker Auckland, New Zealand

mattgwwalker/skyspark-docs 4

User-contributed documentation for SkySpark

mattgwwalker/nz-bank-credit-ratings 3

New Zealand Bank Credit Ratings

mattgwwalker/skyspark-ahu-performance-assessment-rules 3

SkySpark: AHU Performance Assessment Rules

mattgwwalker/hof3-server 1

HTTP and EventSource Server and Interface to HOF3, a Membrane Processing Plant

mattgwwalker/skyspark-simultaneous-cool-and-heat 1

SkySpark: Pod demonstrating testing of Axon-based function to detect simultaneous cooling and heating

mattgwwalker/align-videos-by-sound 0

Align videos/sound files timewise with help of their soundtracks

mattgwwalker/automated-testing-example 0

Simple example showing automated testing in Python

mattgwwalker/cs231n.github.io 0

Public facing notes page

mattgwwalker/eventkit-testing 0

Automated Testing for EventKit Demo

mattgwwalker/evidence-based-scheduling 0

Evidence-Based Scheduling Web App

issue closedTeamMsgExtractor/msg-extractor

A typo in attachment.py is causing it to crash

In order to get your bug addressed in a timely manner, or at all :smiley:, please fill out the below bug report. Please try to make it as easy as possible for us to understand what is going on. We may close out any bugs or issues without warning that are not complete or coherent.

In the bug template below anything is [square brackets] should be filled out or removed if the item doesn't apply.

Should you encounter an error that has not already been reported, please do the following when reporting it: Bug Metadata

  • Version of extract_msg: 0.28.0
  • Your python version: Python 3.8.7
  • How did you launch extract_msg?
    • [ ] My command line or
    • [x] I used the extract_msg package

Describe the bug

The package extract_msg crashes in module attachment class Attachment on line 44 https://github.com/TeamMsgExtractor/msg-extractor/blob/1de3677e9a542105238d57a2976b7d23c856207f/extract_msg/attachment.py#L44 when trying to open a message wit a afByWebReference attachment. Accessing self.__props[37050003].value is causing an AttributeError. Removing the __ seems to fix the problem.

What code did you use or can we use to reproduce this error?

import sys
import traceback
from extract_msg import Message, constants

try:
    m1 = Message("/home/myuser/test_mail/mail_w_webattachment.msg", attachmentErrorBehavior=constants.ATTACHMENT_ERROR_NOT_IMPLEMENTED)
    print (m1.to)
except Exception:
    traceback.print_exception(*sys.exc_info())

Is there a message.msg file you want to share to help us reproduce this?

  • [ ] Uploaded message (drag and drop on this window)
  • [ ] Emailed message as an attachment to admins: [Enter Subject Line Here]
  • [x] The message is confidential, but I could produce one.

Traceback

Traceback (most recent call last):
  File "/home/myuser/dev/msgextract_test/msg-extractor-master/extract_msg/message_base.py", line 142, in attachments
    return self._attachments
AttributeError: 'Message' object has no attribute '_attachments'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/myuser/dev/msgextract_test/msg-extractor-master/playground.py", line 6, in <module>
    m1 = Message("/home/myuser/test_mail/mail_w_webattachment.msg", attachmentErrorBehavior=constants.ATTACHMENT_ERROR_NOT_IMPLEMENTED)
  File "/home/myuser/dev/msgextract_test/msg-extractor-master/extract_msg/message.py", line 28, in __init__
    MessageBase.__init__(self, path, prefix, attachmentClass, filename, delayAttachments, overrideEncoding, attachmentErrorBehavior)
  File "/home/myuser/dev/msgextract_test/msg-extractor-master/extract_msg/message_base.py", line 56, in __init__
    self.attachments
  File "/home/myuser/dev/msgextract_test/msg-extractor-master/extract_msg/message_base.py", line 156, in attachments
    self._attachments.append(self.attachmentClass(self, attachmentDir))
  File "/home/myuser/dev/msgextract_test/msg-extractor-master/extract_msg/attachment.py", line 44, in __init__
    elif (self.__props['37050003'].value & 0x7) == 0x7:
AttributeError: 'Attachment' object has no attribute '_Attachment__props'

Screenshots

None

Additional context

None

closed time in 6 days

z3r0privacy

issue commentTeamMsgExtractor/msg-extractor

A typo in attachment.py is causing it to crash

I've just tested it. Looks like everything is fine, all tests have passed, thank you! The NotImplementedError is not that bad since it rarely happens and thanks to the new UnsupportedAttachment class the handling is even easier.

z3r0privacy

comment created time in 6 days

issue commentTeamMsgExtractor/msg-extractor

A typo in attachment.py is causing it to crash

I think I found all the problems and fixed them in the newest release. Let me know if 0.28.1 still has issues.

z3r0privacy

comment created time in 6 days

created tagTeamMsgExtractor/msg-extractor

tagv0.28.1

Extracts emails and attachments saved in Microsoft Outlook's .msg files

created time in 6 days

release TeamMsgExtractor/msg-extractor

v0.28.1

released time in 6 days

push eventTeamMsgExtractor/msg-extractor

TheElementalOfDestruction

commit sha 92b4a1178d8cb1c487c6e428e7d712542c42babd

v0.28.1

view details

Destiny Peterson

commit sha 60a8ad3c3b97645abfad75efc7333a366ba4c405

Merge pull request #182 from TheElementalOfDestruction/master v0.28.1

view details

push time in 6 days

PR merged TeamMsgExtractor/msg-extractor

v0.28.1

v0.28.1

  • [TeamMsgExtractor #181] Fixed issue in Attachment that arose when moving some of the code to a base class.
  • Fixed small error in utils.parse_type that caused it to incorrectly compare expected and actual length. Fortunately, this had no actual effect aside from a warning.
  • Added the ebcdic module to the requirements to add more supported encodings.
+38 -55

0 comment

8 changed files

TheElementalOfDestruction

pr closed time in 6 days

PR opened TeamMsgExtractor/msg-extractor

v0.28.1

v0.28.1

  • [TeamMsgExtractor #181] Fixed issue in Attachment that arose when moving some of the code to a base class.
  • Fixed small error in utils.parse_type that caused it to incorrectly compare expected and actual length. Fortunately, this had no actual effect aside from a warning.
  • Added the ebcdic module to the requirements to add more supported encodings.
+38 -55

0 comment

8 changed files

pr created time in 6 days

issue commentTeamMsgExtractor/msg-extractor

A typo in attachment.py is causing it to crash

Unfortunately, if you are getting to that point you are likely about to either get a NotImplementedError or a TypeError anyways.

z3r0privacy

comment created time in 6 days

issue commentTeamMsgExtractor/msg-extractor

A typo in attachment.py is causing it to crash

Ack, I knew I would miss one of them. Some of the stuff from Attachment got moved into a base class, so stuff like this was bound to happen. Give me a few minutes and I'll put out a fix.

z3r0privacy

comment created time in 6 days

issue openedTeamMsgExtractor/msg-extractor

A typo in attachment.py is causing it to crash

In order to get your bug addressed in a timely manner, or at all :smiley:, please fill out the below bug report. Please try to make it as easy as possible for us to understand what is going on. We may close out any bugs or issues without warning that are not complete or coherent.

In the bug template below anything is [square brackets] should be filled out or removed if the item doesn't apply.

Should you encounter an error that has not already been reported, please do the following when reporting it: Bug Metadata

  • Version of extract_msg: 0.28.0
  • Your python version: Python 3.8.7
  • How did you launch extract_msg?
    • [ ] My command line or
    • [x] I used the extract_msg package

Describe the bug

The package extract_msg crashes in module attachment class Attachment on line 44 https://github.com/TeamMsgExtractor/msg-extractor/blob/1de3677e9a542105238d57a2976b7d23c856207f/extract_msg/attachment.py#L44 when trying to open a message wit a afByWebReference attachment. Accessing self.__props[37050003].value is causing an AttributeError. Removing the __ seems to fix the problem.

What code did you use or can we use to reproduce this error?

import sys
import traceback
from extract_msg import Message, constants

try:
    m1 = Message("/home/myuser/test_mail/mail_w_webattachment.msg", attachmentErrorBehavior=constants.ATTACHMENT_ERROR_NOT_IMPLEMENTED)
    print (m1.to)
except Exception:
    traceback.print_exception(*sys.exc_info())

Is there a message.msg file you want to share to help us reproduce this?

  • [ ] Uploaded message (drag and drop on this window)
  • [ ] Emailed message as an attachment to admins: [Enter Subject Line Here]
  • [x] The message is confidential, but I could produce one.

Traceback

Traceback (most recent call last):
  File "/home/myuser/dev/msgextract_test/msg-extractor-master/extract_msg/message_base.py", line 142, in attachments
    return self._attachments
AttributeError: 'Message' object has no attribute '_attachments'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/myuser/dev/msgextract_test/msg-extractor-master/playground.py", line 6, in <module>
    m1 = Message("/home/myuser/test_mail/mail_w_webattachment.msg", attachmentErrorBehavior=constants.ATTACHMENT_ERROR_NOT_IMPLEMENTED)
  File "/home/myuser/dev/msgextract_test/msg-extractor-master/extract_msg/message.py", line 28, in __init__
    MessageBase.__init__(self, path, prefix, attachmentClass, filename, delayAttachments, overrideEncoding, attachmentErrorBehavior)
  File "/home/myuser/dev/msgextract_test/msg-extractor-master/extract_msg/message_base.py", line 56, in __init__
    self.attachments
  File "/home/myuser/dev/msgextract_test/msg-extractor-master/extract_msg/message_base.py", line 156, in attachments
    self._attachments.append(self.attachmentClass(self, attachmentDir))
  File "/home/myuser/dev/msgextract_test/msg-extractor-master/extract_msg/attachment.py", line 44, in __init__
    elif (self.__props['37050003'].value & 0x7) == 0x7:
AttributeError: 'Attachment' object has no attribute '_Attachment__props'

Screenshots

None

Additional context

None

created time in 6 days

issue commentTeamMsgExtractor/msg-extractor

NotImplementedError: _getTypedStream 1014

Try running the following code for one (or more) of the files that are having problems. This should show me what is going wrong in the code. This specific code will add more information to the log as well as output it directly to the console. If you already have some code to output to a log file, you can comment out the specified lines. Please provide the full log that this produces:

import extract_msg
import extract_msg.utils
from extract_msg import constants

import logging
import sys

root = logging.getLogger()
root.setLevel(logging.DEBUG)

# ADD CONSOLE LOG HANDLER
handler = logging.StreamHandler(sys.stdout)
handler.setLevel(logging.DEBUG)
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
handler.setFormatter(formatter)
root.addHandler(handler)
# END SECTION

class Message(extract_msg.Message):
    def _getTypedStream(self, filename, prefix = True, _type = None):
        """
        Override function. This should work.
        """
        extract_msg.utils.verifyType(_type)
        filename = self.fix_path(filename, prefix)
        for x in (filename + _type,) if _type is not None else self.slistDir():
            if x.startswith(filename) and x.find('-') == -1:
                contents = self._getStream(x, False)
                if len(contents) == 0:
                    return True, None # We found the file, but it was empty.
                extras = []
                _type = x[-4:]
                if x[-4] == '1': # It's a multiple
                    if _type in ('101F', '101E'):
                        streams = len(contents) // 4 # These lengths are normal.
                    elif _type == '1102':
                        streams = len(contents) // 8 # These lengths have 4 0x00 bytes at the end for seemingly no reason. They are "reserved" bytes
                    elif _type in ('1002', '1003', '1004', '1005', '1007', '1014', '1040', '1048'):
                        try:
                            streams = self.mainProperties[x[-8:]].realLength
                        except:
                            root.error('Could not find matching VariableLengthProp for stream {}'.format(x))
                            streams = len(contents) // (2 if _type in constants.MULTIPLE_2_BYTES else 4 if _type in constants.MULTIPLE_4_BYTES else 8 if _type in constants.MULTIPLE_8_BYTES else 16)
                    else:
                        raise NotImplementedError('The stream specified is of type {}. We don\'t currently understand exactly how this type works. If it is mandatory that you have the contents of this stream, please create an issue labled "NotImplementedError: _getTypedStream {}".'.format(_type, _type))
                    if _type in ('101F', '101E', '1102'):
                        if self.Exists(x + '-00000000', False):
                            for y in range(streams):
                                if self.Exists(x + '-' + extract_msg.utils.properHex(y, 8), False):
                                    extras.append(self._getStream(x + '-' + extract_msg.utils.properHex(y, 8), False))
                    elif _type in ('1002', '1003', '1004', '1005', '1007', '1014', '1040', '1048'):
                        extras = extract_msg.utils.divide(contents, (2 if _type in constants.MULTIPLE_2_BYTES else 4 if _type in constants.MULTIPLE_4_BYTES else 8 if _type in constants.MULTIPLE_8_BYTES else 16))
                        contents = streams
                if _type == '1014':
                    root.debug(contents)
                    root.debug(extras)
                try:
                    return True, extract_msg.utils.parseType(int(_type, 16), contents, self.stringEncoding, extras)
                except Exception as e:
                    root.exception(e)
                    raise
        return False, None # We didn't find the stream.

msg = Message(MESSAGE_PATH)
pratiknht

comment created time in 6 days

issue commentTeamMsgExtractor/msg-extractor

NotImplementedError: _getTypedStream 1014

Still facing the issue. Plz refer to the logs

image

pratiknht

comment created time in 6 days

issue commentTeamMsgExtractor/msg-extractor

NameError in utils.py

Yes - those options work well. I see you've actually documented most things pretty well in the actual codebase - i just had to actually read it !

StephenGrey

comment created time in 11 days

issue commentTeamMsgExtractor/msg-extractor

NameError in utils.py

If you don't care about any text in messages that are attached to the one you are parsing and only care about the body I would highly recommend setting that first option to true. It makes things a lot faster.

I'm going to try around 0.29.0 to rewrite some of the underlying code to make it so things aren't just fully initialized right when it is first called. Many people don't actually use all of the data that is loaded, so waiting for some of it until needed is probably the smarted way to do things. Speed is currently a problem for this module.

StephenGrey

comment created time in 11 days

issue commentTeamMsgExtractor/msg-extractor

NameError in utils.py

Nice one - i'll give it a try. Am benchtesting various ways to extract data from about 100,000 messages.

StephenGrey

comment created time in 11 days

issue commentTeamMsgExtractor/msg-extractor

NameError in utils.py

You actually can (Honestly, I need to write some actual documentation, but that would take a while and I am much better at writing code than I am at documenting it). There are 2 options (before 0.28.0 there was only 1) that allow you to do this. The first is delayAttachments which delays the parsing of attachments until you actively try to retrieve them. The second is attachmentErrorBehavior which was just added. With this option, you can specify how you want the class to deal with attachment errors. You currently have 3 options: Don't catch errors, catch NotImplementedErrors, or catch all errors.

StephenGrey

comment created time in 11 days

issue commentTeamMsgExtractor/msg-extractor

NameError in utils.py

Great stuff - on that other issue, as a work-around, if it is a case of weird attachments, why not just parse the text body but ignore the attachment with a warning. That way you'd still get something.

StephenGrey

comment created time in 11 days

issue commentTeamMsgExtractor/msg-extractor

NameError in utils.py

Lol, whoops. I made a small mistake in the code. While it won't actually affect anything other than a warning, it's not a good thing to have. That's why that warning is appearing: I forgot to unpack the value from the returned tuple.

As for not implemented, yes that is the case that it is a known issue. It has to do with that being something that could take any format because it is application specific rather than file format specific.

StephenGrey

comment created time in 11 days

issue commentTeamMsgExtractor/msg-extractor

NameError in utils.py

Looks like it is fixed now, thanks. Returns: 2021-01-07 16:59:48,105 - root - DEBUG - 1102 2021-01-07 16:59:48,105 - root - DEBUG - b'\x00\x00\x00\x00\x00\x00\x00\x00' 2021-01-07 16:59:48,106 - extract_msg.utils - WARNING - Error while parsing multiple type. Expected length (0,), got 0. Ignoring.

But the body of the msg is parsed correctly.

(NB: that deals with vast majority of msgs that were failing. Only remaining issue is ones that throw: "NotImplementedError: Current version of extract_msg does not support extraction of containers that are not embedded msg files." = But that looks like a known issue.

StephenGrey

comment created time in 11 days

issue commentTeamMsgExtractor/msg-extractor

extract_msg returned "command not found" after installing via pip

I have received confirmation that this should now be working in version 0.28.0. If I don't get a response in the negative within the week, I will assume you are having no more problems and close this issue.

haruyosh

comment created time in 11 days

issue closedTeamMsgExtractor/msg-extractor

Add a wrapper script in the bin directory to make it easier to run extract_msg

Bug Metadata

  • Version of extract_msg: 0.27.8
  • Your python version: Python 3.9.0
  • How did you launch extract_msg?
    • python3 -m extract_msg

Describe the bug The pip package doesn't install a script in the bin directory (or maybe it doesn't happen on python 3.9 or Fedora for some reason). I have to use python3 -m extract_msg ... in order to run the main script. It would be nice if when installing the wheel file via pip that a wrapper script was installed in the normal bin directory to make it easier to launch. ~/.local/bin is normally added to your PATH for example.

closed time in 11 days

zeroepoch

issue commentTeamMsgExtractor/msg-extractor

Add a wrapper script in the bin directory to make it easier to run extract_msg

Thanks @TheElementalOfDestruction! It's working as expected now. I see you were also able to remove the wrapper script as well which is nice.

zeroepoch

comment created time in 11 days

release TeamMsgExtractor/msg-extractor

v0.28.0

released time in 11 days

created tagTeamMsgExtractor/msg-extractor

tagv0.28.0

Extracts emails and attachments saved in Microsoft Outlook's .msg files

created time in 11 days

push eventTeamMsgExtractor/msg-extractor

TheElementalOfDestruction

commit sha 9b919f55332a01ee5e7e32c7f680100bc49b1775

v0.28.0

view details

Destiny Peterson

commit sha 1de3677e9a542105238d57a2976b7d23c856207f

Merge pull request #180 from TheElementalOfDestruction/master v0.28.0

view details

push time in 11 days

PR merged TeamMsgExtractor/msg-extractor

v0.28.0

v0.28.0

  • [TeamMsgExtractor #87] Added a new system to handle NotImplementedError and other exceptions. All msg classes now have an option called attachmentErrorBehavior that tells the class what to do if it has an error. The value should be one of three constants: ATTACHMENT_ERROR_THROW, ATTACHMENT_ERROR_NOT_IMPLEMENTED, or ATTACHMENT_ERROR_BROKEN. ATTACHMENT_ERROR_THROW tells the class to not catch and exceptions and just let the user handle them. ATTACHMENT_ERROR_NOT_IMPLEMENTED tells the class to catch NotImplementedError exceptions and put an instance of UnsupportedAttachment in place of a regular attachment. ATTACHMENT_ERROR_BROKEN tells the class to catch all exceptions and either replace the attachment with UnsupportedAttachment if it is a NotImplementedError or BrokenAttachment for all other exceptions. With both of those options, caught exceptions will be logged.
  • In making the previous point work, much code from Attachment has been moved to a new class called AttachmentBase. Both BrokenAttachment and UnsupportedAttachment are subclasses of AttachmentBase meaning data can be extracted from their streams in the same way as a functioning attachment.
  • [TeamMsgExtractor #162] Pretty sure I actually got it this time. The execution flag should be applied by pip now.
  • Fixed typos in some exceptions
+290 -208

0 comment

14 changed files

TheElementalOfDestruction

pr closed time in 11 days

issue commentTeamMsgExtractor/msg-extractor

Non msg containers embedded as attachments cause all attachments to fail

Version 0.28.0 Has been release with support for this feature. All msg classes now have an additional option known as attachmentErrorBehavior which can be used to specify how attachment exceptions are handled.

0x3a

comment created time in 11 days

issue commentTeamMsgExtractor/msg-extractor

Add a wrapper script in the bin directory to make it easier to run extract_msg

Thanks to @staticaland I figured out how to do this (I think). Based on some tests that I did, this should now be completely fixed in 0.28.0

zeroepoch

comment created time in 11 days

PR opened TeamMsgExtractor/msg-extractor

v0.28.0

v0.28.0

  • [TeamMsgExtractor #87] Added a new system to handle NotImplementedError and other exceptions. All msg classes now have an option called attachmentErrorBehavior that tells the class what to do if it has an error. The value should be one of three constants: ATTACHMENT_ERROR_THROW, ATTACHMENT_ERROR_NOT_IMPLEMENTED, or ATTACHMENT_ERROR_BROKEN. ATTACHMENT_ERROR_THROW tells the class to not catch and exceptions and just let the user handle them. ATTACHMENT_ERROR_NOT_IMPLEMENTED tells the class to catch NotImplementedError exceptions and put an instance of UnsupportedAttachment in place of a regular attachment. ATTACHMENT_ERROR_BROKEN tells the class to catch all exceptions and either replace the attachment with UnsupportedAttachment if it is a NotImplementedError or BrokenAttachment for all other exceptions. With both of those options, caught exceptions will be logged.
  • In making the previous point work, much code from Attachment has been moved to a new class called AttachmentBase. Both BrokenAttachment and UnsupportedAttachment are subclasses of AttachmentBase meaning data can be extracted from their streams in the same way as a functioning attachment.
  • [TeamMsgExtractor #162] Pretty sure I actually got it this time. The execution flag should be applied by pip now.
  • Fixed typos in some exceptions
+290 -208

0 comment

14 changed files

pr created time in 11 days

more