User-contributed documentation for SkySpark
mattgwwalker/nz-bank-credit-ratings 3
New Zealand Bank Credit Ratings
mattgwwalker/skyspark-ahu-performance-assessment-rules 3
SkySpark: AHU Performance Assessment Rules
HTTP and EventSource Server and Interface to HOF3, a Membrane Processing Plant
mattgwwalker/skyspark-simultaneous-cool-and-heat 1
SkySpark: Pod demonstrating testing of Axon-based function to detect simultaneous cooling and heating
mattgwwalker/align-videos-by-sound 0
Align videos/sound files timewise with help of their soundtracks
mattgwwalker/automated-testing-example 0
Simple example showing automated testing in Python
mattgwwalker/cs231n.github.io 0
Public facing notes page
mattgwwalker/eventkit-testing 0
Automated Testing for EventKit Demo
mattgwwalker/evidence-based-scheduling 0
Evidence-Based Scheduling Web App
issue closedTeamMsgExtractor/msg-extractor
A typo in attachment.py is causing it to crash
In order to get your bug addressed in a timely manner, or at all :smiley:, please fill out the below bug report. Please try to make it as easy as possible for us to understand what is going on. We may close out any bugs or issues without warning that are not complete or coherent.
In the bug template below anything is [square brackets] should be filled out or removed if the item doesn't apply.
Should you encounter an error that has not already been reported, please do the following when reporting it: Bug Metadata
- Version of extract_msg: 0.28.0
- Your python version: Python 3.8.7
- How did you launch extract_msg?
- [ ] My command line or
- [x] I used the extract_msg package
Describe the bug
The package extract_msg crashes in module attachment
class Attachment
on line 44 https://github.com/TeamMsgExtractor/msg-extractor/blob/1de3677e9a542105238d57a2976b7d23c856207f/extract_msg/attachment.py#L44 when trying to open a message wit a afByWebReference
attachment. Accessing self.__props[37050003].value
is causing an AttributeError
. Removing the __
seems to fix the problem.
What code did you use or can we use to reproduce this error?
import sys
import traceback
from extract_msg import Message, constants
try:
m1 = Message("/home/myuser/test_mail/mail_w_webattachment.msg", attachmentErrorBehavior=constants.ATTACHMENT_ERROR_NOT_IMPLEMENTED)
print (m1.to)
except Exception:
traceback.print_exception(*sys.exc_info())
Is there a message.msg file you want to share to help us reproduce this?
- [ ] Uploaded message (drag and drop on this window)
- [ ] Emailed message as an attachment to admins: [Enter Subject Line Here]
- [x] The message is confidential, but I could produce one.
Traceback
Traceback (most recent call last):
File "/home/myuser/dev/msgextract_test/msg-extractor-master/extract_msg/message_base.py", line 142, in attachments
return self._attachments
AttributeError: 'Message' object has no attribute '_attachments'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/myuser/dev/msgextract_test/msg-extractor-master/playground.py", line 6, in <module>
m1 = Message("/home/myuser/test_mail/mail_w_webattachment.msg", attachmentErrorBehavior=constants.ATTACHMENT_ERROR_NOT_IMPLEMENTED)
File "/home/myuser/dev/msgextract_test/msg-extractor-master/extract_msg/message.py", line 28, in __init__
MessageBase.__init__(self, path, prefix, attachmentClass, filename, delayAttachments, overrideEncoding, attachmentErrorBehavior)
File "/home/myuser/dev/msgextract_test/msg-extractor-master/extract_msg/message_base.py", line 56, in __init__
self.attachments
File "/home/myuser/dev/msgextract_test/msg-extractor-master/extract_msg/message_base.py", line 156, in attachments
self._attachments.append(self.attachmentClass(self, attachmentDir))
File "/home/myuser/dev/msgextract_test/msg-extractor-master/extract_msg/attachment.py", line 44, in __init__
elif (self.__props['37050003'].value & 0x7) == 0x7:
AttributeError: 'Attachment' object has no attribute '_Attachment__props'
Screenshots
None
Additional context
None
closed time in 6 days
z3r0privacyissue commentTeamMsgExtractor/msg-extractor
A typo in attachment.py is causing it to crash
I've just tested it. Looks like everything is fine, all tests have passed, thank you!
The NotImplementedError
is not that bad since it rarely happens and thanks to the new UnsupportedAttachment
class the handling is even easier.
comment created time in 6 days
issue commentTeamMsgExtractor/msg-extractor
A typo in attachment.py is causing it to crash
I think I found all the problems and fixed them in the newest release. Let me know if 0.28.1 still has issues.
comment created time in 6 days
created tagTeamMsgExtractor/msg-extractor
Extracts emails and attachments saved in Microsoft Outlook's .msg files
created time in 6 days
push eventTeamMsgExtractor/msg-extractor
commit sha 92b4a1178d8cb1c487c6e428e7d712542c42babd
v0.28.1
commit sha 60a8ad3c3b97645abfad75efc7333a366ba4c405
Merge pull request #182 from TheElementalOfDestruction/master v0.28.1
push time in 6 days
PR merged TeamMsgExtractor/msg-extractor
v0.28.1
- [TeamMsgExtractor #181] Fixed issue in
Attachment
that arose when moving some of the code to a base class. - Fixed small error in
utils.parse_type
that caused it to incorrectly compare expected and actual length. Fortunately, this had no actual effect aside from a warning. - Added the
ebcdic
module to the requirements to add more supported encodings.
pr closed time in 6 days
PR opened TeamMsgExtractor/msg-extractor
v0.28.1
- [TeamMsgExtractor #181] Fixed issue in
Attachment
that arose when moving some of the code to a base class. - Fixed small error in
utils.parse_type
that caused it to incorrectly compare expected and actual length. Fortunately, this had no actual effect aside from a warning. - Added the
ebcdic
module to the requirements to add more supported encodings.
pr created time in 6 days
issue commentTeamMsgExtractor/msg-extractor
A typo in attachment.py is causing it to crash
Unfortunately, if you are getting to that point you are likely about to either get a NotImplementedError
or a TypeError
anyways.
comment created time in 6 days
issue commentTeamMsgExtractor/msg-extractor
A typo in attachment.py is causing it to crash
Ack, I knew I would miss one of them. Some of the stuff from Attachment got moved into a base class, so stuff like this was bound to happen. Give me a few minutes and I'll put out a fix.
comment created time in 6 days
issue openedTeamMsgExtractor/msg-extractor
A typo in attachment.py is causing it to crash
In order to get your bug addressed in a timely manner, or at all :smiley:, please fill out the below bug report. Please try to make it as easy as possible for us to understand what is going on. We may close out any bugs or issues without warning that are not complete or coherent.
In the bug template below anything is [square brackets] should be filled out or removed if the item doesn't apply.
Should you encounter an error that has not already been reported, please do the following when reporting it: Bug Metadata
- Version of extract_msg: 0.28.0
- Your python version: Python 3.8.7
- How did you launch extract_msg?
- [ ] My command line or
- [x] I used the extract_msg package
Describe the bug
The package extract_msg crashes in module attachment
class Attachment
on line 44 https://github.com/TeamMsgExtractor/msg-extractor/blob/1de3677e9a542105238d57a2976b7d23c856207f/extract_msg/attachment.py#L44 when trying to open a message wit a afByWebReference
attachment. Accessing self.__props[37050003].value
is causing an AttributeError
. Removing the __
seems to fix the problem.
What code did you use or can we use to reproduce this error?
import sys
import traceback
from extract_msg import Message, constants
try:
m1 = Message("/home/myuser/test_mail/mail_w_webattachment.msg", attachmentErrorBehavior=constants.ATTACHMENT_ERROR_NOT_IMPLEMENTED)
print (m1.to)
except Exception:
traceback.print_exception(*sys.exc_info())
Is there a message.msg file you want to share to help us reproduce this?
- [ ] Uploaded message (drag and drop on this window)
- [ ] Emailed message as an attachment to admins: [Enter Subject Line Here]
- [x] The message is confidential, but I could produce one.
Traceback
Traceback (most recent call last):
File "/home/myuser/dev/msgextract_test/msg-extractor-master/extract_msg/message_base.py", line 142, in attachments
return self._attachments
AttributeError: 'Message' object has no attribute '_attachments'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/myuser/dev/msgextract_test/msg-extractor-master/playground.py", line 6, in <module>
m1 = Message("/home/myuser/test_mail/mail_w_webattachment.msg", attachmentErrorBehavior=constants.ATTACHMENT_ERROR_NOT_IMPLEMENTED)
File "/home/myuser/dev/msgextract_test/msg-extractor-master/extract_msg/message.py", line 28, in __init__
MessageBase.__init__(self, path, prefix, attachmentClass, filename, delayAttachments, overrideEncoding, attachmentErrorBehavior)
File "/home/myuser/dev/msgextract_test/msg-extractor-master/extract_msg/message_base.py", line 56, in __init__
self.attachments
File "/home/myuser/dev/msgextract_test/msg-extractor-master/extract_msg/message_base.py", line 156, in attachments
self._attachments.append(self.attachmentClass(self, attachmentDir))
File "/home/myuser/dev/msgextract_test/msg-extractor-master/extract_msg/attachment.py", line 44, in __init__
elif (self.__props['37050003'].value & 0x7) == 0x7:
AttributeError: 'Attachment' object has no attribute '_Attachment__props'
Screenshots
None
Additional context
None
created time in 6 days
issue commentTeamMsgExtractor/msg-extractor
NotImplementedError: _getTypedStream 1014
Try running the following code for one (or more) of the files that are having problems. This should show me what is going wrong in the code. This specific code will add more information to the log as well as output it directly to the console. If you already have some code to output to a log file, you can comment out the specified lines. Please provide the full log that this produces:
import extract_msg
import extract_msg.utils
from extract_msg import constants
import logging
import sys
root = logging.getLogger()
root.setLevel(logging.DEBUG)
# ADD CONSOLE LOG HANDLER
handler = logging.StreamHandler(sys.stdout)
handler.setLevel(logging.DEBUG)
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
handler.setFormatter(formatter)
root.addHandler(handler)
# END SECTION
class Message(extract_msg.Message):
def _getTypedStream(self, filename, prefix = True, _type = None):
"""
Override function. This should work.
"""
extract_msg.utils.verifyType(_type)
filename = self.fix_path(filename, prefix)
for x in (filename + _type,) if _type is not None else self.slistDir():
if x.startswith(filename) and x.find('-') == -1:
contents = self._getStream(x, False)
if len(contents) == 0:
return True, None # We found the file, but it was empty.
extras = []
_type = x[-4:]
if x[-4] == '1': # It's a multiple
if _type in ('101F', '101E'):
streams = len(contents) // 4 # These lengths are normal.
elif _type == '1102':
streams = len(contents) // 8 # These lengths have 4 0x00 bytes at the end for seemingly no reason. They are "reserved" bytes
elif _type in ('1002', '1003', '1004', '1005', '1007', '1014', '1040', '1048'):
try:
streams = self.mainProperties[x[-8:]].realLength
except:
root.error('Could not find matching VariableLengthProp for stream {}'.format(x))
streams = len(contents) // (2 if _type in constants.MULTIPLE_2_BYTES else 4 if _type in constants.MULTIPLE_4_BYTES else 8 if _type in constants.MULTIPLE_8_BYTES else 16)
else:
raise NotImplementedError('The stream specified is of type {}. We don\'t currently understand exactly how this type works. If it is mandatory that you have the contents of this stream, please create an issue labled "NotImplementedError: _getTypedStream {}".'.format(_type, _type))
if _type in ('101F', '101E', '1102'):
if self.Exists(x + '-00000000', False):
for y in range(streams):
if self.Exists(x + '-' + extract_msg.utils.properHex(y, 8), False):
extras.append(self._getStream(x + '-' + extract_msg.utils.properHex(y, 8), False))
elif _type in ('1002', '1003', '1004', '1005', '1007', '1014', '1040', '1048'):
extras = extract_msg.utils.divide(contents, (2 if _type in constants.MULTIPLE_2_BYTES else 4 if _type in constants.MULTIPLE_4_BYTES else 8 if _type in constants.MULTIPLE_8_BYTES else 16))
contents = streams
if _type == '1014':
root.debug(contents)
root.debug(extras)
try:
return True, extract_msg.utils.parseType(int(_type, 16), contents, self.stringEncoding, extras)
except Exception as e:
root.exception(e)
raise
return False, None # We didn't find the stream.
msg = Message(MESSAGE_PATH)
comment created time in 6 days
issue commentTeamMsgExtractor/msg-extractor
NotImplementedError: _getTypedStream 1014
Still facing the issue. Plz refer to the logs
comment created time in 6 days
issue commentTeamMsgExtractor/msg-extractor
Yes - those options work well. I see you've actually documented most things pretty well in the actual codebase - i just had to actually read it !
comment created time in 11 days
issue commentTeamMsgExtractor/msg-extractor
If you don't care about any text in messages that are attached to the one you are parsing and only care about the body I would highly recommend setting that first option to true. It makes things a lot faster.
I'm going to try around 0.29.0 to rewrite some of the underlying code to make it so things aren't just fully initialized right when it is first called. Many people don't actually use all of the data that is loaded, so waiting for some of it until needed is probably the smarted way to do things. Speed is currently a problem for this module.
comment created time in 11 days
issue commentTeamMsgExtractor/msg-extractor
Nice one - i'll give it a try. Am benchtesting various ways to extract data from about 100,000 messages.
comment created time in 11 days
issue commentTeamMsgExtractor/msg-extractor
You actually can (Honestly, I need to write some actual documentation, but that would take a while and I am much better at writing code than I am at documenting it). There are 2 options (before 0.28.0 there was only 1) that allow you to do this. The first is delayAttachments
which delays the parsing of attachments until you actively try to retrieve them. The second is attachmentErrorBehavior
which was just added. With this option, you can specify how you want the class to deal with attachment errors. You currently have 3 options: Don't catch errors, catch NotImplementedError
s, or catch all errors.
comment created time in 11 days
issue commentTeamMsgExtractor/msg-extractor
Great stuff - on that other issue, as a work-around, if it is a case of weird attachments, why not just parse the text body but ignore the attachment with a warning. That way you'd still get something.
comment created time in 11 days
issue commentTeamMsgExtractor/msg-extractor
Lol, whoops. I made a small mistake in the code. While it won't actually affect anything other than a warning, it's not a good thing to have. That's why that warning is appearing: I forgot to unpack the value from the returned tuple.
As for not implemented, yes that is the case that it is a known issue. It has to do with that being something that could take any format because it is application specific rather than file format specific.
comment created time in 11 days
issue commentTeamMsgExtractor/msg-extractor
Looks like it is fixed now, thanks. Returns: 2021-01-07 16:59:48,105 - root - DEBUG - 1102 2021-01-07 16:59:48,105 - root - DEBUG - b'\x00\x00\x00\x00\x00\x00\x00\x00' 2021-01-07 16:59:48,106 - extract_msg.utils - WARNING - Error while parsing multiple type. Expected length (0,), got 0. Ignoring.
But the body of the msg is parsed correctly.
(NB: that deals with vast majority of msgs that were failing. Only remaining issue is ones that throw: "NotImplementedError: Current version of extract_msg does not support extraction of containers that are not embedded msg files." = But that looks like a known issue.
comment created time in 11 days
issue commentTeamMsgExtractor/msg-extractor
extract_msg returned "command not found" after installing via pip
I have received confirmation that this should now be working in version 0.28.0. If I don't get a response in the negative within the week, I will assume you are having no more problems and close this issue.
comment created time in 11 days
issue closedTeamMsgExtractor/msg-extractor
Add a wrapper script in the bin directory to make it easier to run extract_msg
Bug Metadata
- Version of extract_msg: 0.27.8
- Your python version: Python 3.9.0
- How did you launch extract_msg?
python3 -m extract_msg
Describe the bug
The pip package doesn't install a script in the bin directory (or maybe it doesn't happen on python 3.9 or Fedora for some reason). I have to use python3 -m extract_msg ...
in order to run the main script. It would be nice if when installing the wheel file via pip that a wrapper script was installed in the normal bin directory to make it easier to launch. ~/.local/bin
is normally added to your PATH
for example.
closed time in 11 days
zeroepochissue commentTeamMsgExtractor/msg-extractor
Add a wrapper script in the bin directory to make it easier to run extract_msg
Thanks @TheElementalOfDestruction! It's working as expected now. I see you were also able to remove the wrapper script as well which is nice.
comment created time in 11 days
created tagTeamMsgExtractor/msg-extractor
Extracts emails and attachments saved in Microsoft Outlook's .msg files
created time in 11 days
push eventTeamMsgExtractor/msg-extractor
commit sha 9b919f55332a01ee5e7e32c7f680100bc49b1775
v0.28.0
commit sha 1de3677e9a542105238d57a2976b7d23c856207f
Merge pull request #180 from TheElementalOfDestruction/master v0.28.0
push time in 11 days
PR merged TeamMsgExtractor/msg-extractor
v0.28.0
- [TeamMsgExtractor #87] Added a new system to handle
NotImplementedError
and other exceptions. All msg classes now have an option calledattachmentErrorBehavior
that tells the class what to do if it has an error. The value should be one of three constants:ATTACHMENT_ERROR_THROW
,ATTACHMENT_ERROR_NOT_IMPLEMENTED
, orATTACHMENT_ERROR_BROKEN
.ATTACHMENT_ERROR_THROW
tells the class to not catch and exceptions and just let the user handle them.ATTACHMENT_ERROR_NOT_IMPLEMENTED
tells the class to catchNotImplementedError
exceptions and put an instance ofUnsupportedAttachment
in place of a regular attachment.ATTACHMENT_ERROR_BROKEN
tells the class to catch all exceptions and either replace the attachment withUnsupportedAttachment
if it is aNotImplementedError
orBrokenAttachment
for all other exceptions. With both of those options, caught exceptions will be logged. - In making the previous point work, much code from
Attachment
has been moved to a new class calledAttachmentBase
. BothBrokenAttachment
andUnsupportedAttachment
are subclasses ofAttachmentBase
meaning data can be extracted from their streams in the same way as a functioning attachment. - [TeamMsgExtractor #162] Pretty sure I actually got it this time. The execution flag should be applied by pip now.
- Fixed typos in some exceptions
pr closed time in 11 days
issue commentTeamMsgExtractor/msg-extractor
Non msg containers embedded as attachments cause all attachments to fail
Version 0.28.0 Has been release with support for this feature. All msg classes now have an additional option known as attachmentErrorBehavior
which can be used to specify how attachment exceptions are handled.
comment created time in 11 days
issue commentTeamMsgExtractor/msg-extractor
Add a wrapper script in the bin directory to make it easier to run extract_msg
Thanks to @staticaland I figured out how to do this (I think). Based on some tests that I did, this should now be completely fixed in 0.28.0
comment created time in 11 days
PR opened TeamMsgExtractor/msg-extractor
v0.28.0
- [TeamMsgExtractor #87] Added a new system to handle
NotImplementedError
and other exceptions. All msg classes now have an option calledattachmentErrorBehavior
that tells the class what to do if it has an error. The value should be one of three constants:ATTACHMENT_ERROR_THROW
,ATTACHMENT_ERROR_NOT_IMPLEMENTED
, orATTACHMENT_ERROR_BROKEN
.ATTACHMENT_ERROR_THROW
tells the class to not catch and exceptions and just let the user handle them.ATTACHMENT_ERROR_NOT_IMPLEMENTED
tells the class to catchNotImplementedError
exceptions and put an instance ofUnsupportedAttachment
in place of a regular attachment.ATTACHMENT_ERROR_BROKEN
tells the class to catch all exceptions and either replace the attachment withUnsupportedAttachment
if it is aNotImplementedError
orBrokenAttachment
for all other exceptions. With both of those options, caught exceptions will be logged. - In making the previous point work, much code from
Attachment
has been moved to a new class calledAttachmentBase
. BothBrokenAttachment
andUnsupportedAttachment
are subclasses ofAttachmentBase
meaning data can be extracted from their streams in the same way as a functioning attachment. - [TeamMsgExtractor #162] Pretty sure I actually got it this time. The execution flag should be applied by pip now.
- Fixed typos in some exceptions
pr created time in 11 days