pefile¶
Portable Executable Reader Module
All of the basic PE file structures are available with their default names as attributes of the instance returned.
Processed elements, such as the import table, are available with lowercase names to differentiate them from the uppercase basic-structure names.
pefile has been tested against many edge cases, such as corrupted and malformed PEs, as well as malware, which often attempts to abuse the format beyond its intended use. To the best of my knowledge, most of the abuse is handled gracefully.
-
class
pefile.BaseRelocationData(**args)¶ Holds base relocation information.
- struct
IMAGE_BASE_RELOCATIONstructure.- entries
- List of relocation data as
RelocationDatainstances.
-
class
pefile.BoundImportDescData(**args)¶ Holds bound import descriptor data.
This directory entry will provide information on the DLLs this PE file has been bound to (if bound at all). The structure will contain the name and timestamp of the DLL at the time of binding so that the loader can know whether it differs from the one currently present in the system and must, therefore, re-bind the PE’s imports.
- struct
IMAGE_BOUND_IMPORT_DESCRIPTORstructure.- name
- DLL name.
- entries
- List of entries as
BoundImportRefDatainstances. The entries will exist if this DLL has forwarded symbols. If so, the destination DLL will have an entry in this list.
-
class
pefile.BoundImportRefData(**args)¶ Holds bound import forwarder reference data.
Contains the same information as the bound descriptor but for forwarded DLLs, if any.
- struct
IMAGE_BOUND_FORWARDER_REFstructure.- name
- DLL name.
-
class
pefile.DataContainer(**args)¶ Generic data container.
-
class
pefile.DebugData(**args)¶ Holds debug information.
- struct
IMAGE_DEBUG_DIRECTORYstructure.- entries
- List of entries as
IMAGE_DEBUG_TYPEinstances.
-
class
pefile.Dump¶ Convenience class for dumping the PE information.
-
add(txt, indent=0)¶ Adds some text, no newline will be appended.
The text can be indented with the optional
indentargument.
-
add_header(txt)¶ Adds a header element.
-
add_line(txt, indent=0)¶ Adds a line.
The line can be indented with the optional
indentargument.
-
add_lines(txt, indent=0)¶ Adds a list of lines.
The list can be indented with the optional
indentargument.
-
add_newline()¶ Adds a newline.
-
get_text()¶ Get the text in its current state.
-
-
class
pefile.ExportData(**args)¶ Holds exported symbols’ information.
- ordinal
- Ordinal of the symbol.
- address
- Address of the symbol.
- name
- Name of the symbol; None if the symbol is exported by ordinal only.
- forwarder
- If the symbol is forwarded, it will contain the name of the target symbol, otherwise None.
-
class
pefile.ExportDirData(**args)¶ Holds export directory information.
- struct
IMAGE_EXPORT_DIRECTORYstructure.- symbols
- List of exported symbols as
ExportDatainstances.
-
class
pefile.ImportData(**args)¶ Holds imported symbol’s information.
- ordinal
- Ordinal of the symbol.
- name
- Name of the symbol.
- bound
- If the symbol is bound, this contains the address.
-
class
pefile.ImportDescData(**args)¶ Holds import descriptor information.
- dll
- Name of the imported DLL.
- imports
- List of imported symbols as
ImportDatainstances. - struct
IMAGE_IMPORT_DESCRIPTORstructure.
-
class
pefile.LoadConfigData(**args)¶ Holds Load Config data.
- struct
IMAGE_LOAD_CONFIG_DIRECTORYstructure.- name
- DLL name.
-
class
pefile.PE(name=None, data=None, fast_load=None, max_symbol_exports=8192)¶ A Portable Executable representation.
This class provides access to most of the information in a PE file.
It expects to be supplied the name of the file to load, or PE data to process and an optional argument
fast_load(None by default), which controls whether to load all the directories information, which can be quite time consuming.Three ways to load and process
module.dll:pe = pefile.PE('module.dll')pe = pefile.PE(name='module.dll')
If the data is already available in a buffer, the same can be achieved with:
pe = pefile.PE(data=module_dll_data)
The
fast_loadargument can be set to a default by setting its value in the module like this:pefile.fast_load = True. That will make all the subsequent instances not to load the whole PE structure. Thefull_load()method can be used to parse the missing data at a later stage.Basic headers information will be available in these attributes:
DOS_HEADERNT_HEADERSFILE_HEADEROPTIONAL_HEADER
All of them will contain among their attributes the members of the corresponding structures as defined in
WINNT.H.The raw data corresponding to the header (from the beginning of the file up to the start of the first section) will be available in the instance’s attribute
headeras a string.The sections will be available as a list in the
sectionsattribute. Each entry will contain as attributes all the structure’s members.Directory entries will be available as attributes (if they exist). No other entries are processed at this point.
DIRECTORY_ENTRY_IMPORT: list ofImportDescDatainstancesDIRECTORY_ENTRY_EXPORT:ExportDirDatainstanceDIRECTORY_ENTRY_RESOURCE:ResourceDirDatainstanceDIRECTORY_ENTRY_DEBUG: list ofDebugDatainstancesDIRECTORY_ENTRY_BASERELOC: list ofBaseRelocationDatainstancesDIRECTORY_ENTRY_TLS:TlsDatainstanceDIRECTORY_ENTRY_BOUND_IMPORT: list ofBoundImportDescDatainstances
The following dictionary attributes provide ways of mapping different constants. They will accept the numeric value and return the string representation and the opposite, feed in the string and get the numeric constant:
DIRECTORY_ENTRYIMAGE_CHARACTERISTICSSECTION_CHARACTERISTICSDEBUG_TYPESUBSYSTEM_TYPEMACHINE_TYPERELOCATION_TYPERESOURCE_TYPELANGSUBLANG
-
dump_dict(dump=None)¶ Dump all the PE header information into a dictionary.
-
dump_info(dump=None, encoding='ascii')¶ Dump all the PE header information into human readable string.
-
dword_align(offset, base)¶
-
full_load()¶ Process the data directories.
This method will load the data directories which might not have been loaded if the
fast_loadoption was used.
-
generate_checksum()¶
-
get_bytes_from_data(offset, data)¶
-
get_data(rva=0, length=None)¶ Get data regardless of its section.
Given a RVA and the size of the chunk to retrieve, this method will find the section where the data lies and return the data.
-
get_data_from_dword(dword)¶ Return a four byte string representing the double word value (little endian).
-
get_data_from_qword(word)¶ Return an eight byte string representing the quad-word value (little endian).
-
get_data_from_word(word)¶ Return a two byte string representing the word value (little endian).
-
get_dword_at_rva(rva)¶ Return the double word value at the given RVA.
Returns None if the value can’t be read (i.e., the RVA can’t be mapped to a file offset).
-
get_dword_from_data(data, offset)¶ Convert four bytes of data to a double word (little endian).
The
offsetis assumed to index into a dword array. So setting it toNwill return a dword out of the data starting at offsetN*4.Returns None if the data can’t be turned into a double word.
-
get_dword_from_offset(offset)¶ Return the double word value at the given file offset (little endian).
-
get_imphash()¶
-
get_import_table(rva, max_length=None)¶
-
get_memory_mapped_image(max_virtual_address=268435456, ImageBase=None)¶ Returns the data corresponding to the memory layout of the PE file.
The data includes the PE header and the sections loaded at offsets corresponding to their relative virtual addresses (the
VirtualAddresssection header member). Any offset in this data corresponds to the absolute memory addressImageBase+offset.The optional argument
max_virtual_addressprovides with means of limiting which sections are processed. Any section with theirVirtualAddressbeyond this value will be skipped. Normally, sections with values beyond this range are just there to confuse tools. It’s a common trick to see in packed executables.If the optional
ImageBaseargument is supplied, the file’s relocations will be applied to the image by calling therelocate_image()method. Beware that the relocation information is applied permanently.
-
get_offset_from_rva(rva)¶ Get the file offset corresponding to this RVA.
Given a RVA , this method will find the section where the data lies and return the offset within the file.
-
get_overlay()¶ Get the data appended to the file and not contained within the area described in the headers.
-
get_overlay_data_start_offset()¶ Get the offset of data appended to the file and not contained within the area described in the headers.
-
get_physical_by_rva(rva)¶ Gets the physical address in the PE file from an RVA value.
-
get_qword_at_rva(rva)¶ Return the quad-word value at the given RVA.
Returns None if the value can’t be read (i.e., the RVA can’t be mapped to a file offset).
-
get_qword_from_data(data, offset)¶ Convert eight bytes of data to a word (little endian).
The
offsetis assumed to index into a word array. So setting it toNwill return a dword out of the data starting at offsetN*8.Returns None if the data can’t be turned into a quad word.
-
get_qword_from_offset(offset)¶ Return the quad-word value at the given file offset (little endian).
-
get_resources_strings()¶ Returns a list of all the strings found withing the resources (if any).
This method will scan all entries in the resources directory of the PE, if there is one, and will return a list of the strings.
An empty list will be returned otherwise.
-
get_rva_from_offset(offset)¶ Get the RVA corresponding to this file offset.
-
get_section_by_offset(offset)¶ Get the section containing the given file offset.
-
get_section_by_rva(rva)¶ Get the section containing the given address.
-
get_string_at_rva(rva, max_length=1048576)¶ Get an ASCII string located at the given address.
-
get_string_from_data(offset, data)¶ Get an ASCII string from data.
-
get_string_u_at_rva(rva, max_length=65536, encoding=None)¶ Get a Unicode string located at the given address.
-
get_warnings()¶ Return the list of warnings.
Non-critical problems found when parsing the PE file are appended to a list of warnings. This method returns the full list.
-
get_word_at_rva(rva)¶ Return the word value at the given RVA.
Returns None if the value can’t be read (i.e., the RVA can’t be mapped to a file offset).
-
get_word_from_data(data, offset)¶ Convert two bytes of data to a word (little endian).
The
offsetis assumed to index into a word array. So setting it toNwill return a dword out of the data starting at offsetN*2.Returns None if the data can’t be turned into a word.
-
get_word_from_offset(offset)¶ Return the word value at the given file offset (little endian).
-
has_relocs()¶ Checks if the PE file has relocation directory
-
is_dll()¶ Check whether the file is a standard DLL.
This will return true only if the image has the
IMAGE_FILE_DLLflag set.
-
is_driver()¶ Check whether the file is a Windows driver.
This will return true only if there are reliable indicators of the image being a driver.
-
is_exe()¶ Check whether the file is a standard executable.
This will return true only if the file has the
IMAGE_FILE_EXECUTABLE_IMAGEflag set and theIMAGE_FILE_DLLnot set and the file does not appear to be a driver either.
-
merge_modified_section_data()¶ Update the PE image content with any individual section data that has been modified.
-
parse_data_directories(directories=None, forwarded_exports_only=False, import_dllnames_only=False)¶ Parse and process the PE file’s data directories.
If the optional
directoriesargument is provided, only the directories at the specified indexes will be parsed. Such functionality allows parsing of areas of interest without the burden of having to parse all others. The directories can then be specified as:For export / import only:
directories = [ 0, 1 ]
or (more verbosely):
directories = [ DIRECTORY_ENTRY['IMAGE_DIRECTORY_ENTRY_IMPORT'], DIRECTORY_ENTRY['IMAGE_DIRECTORY_ENTRY_EXPORT'] ]
If
directoriesis a list, the ones that are processed will be removed, leaving only the ones that are not present in the image.If
forwarded_exports_onlyis True, then theIMAGE_DIRECTORY_ENTRY_EXPORTattribute will only contain exports that are forwarded to another DLL.If
import_dllnames_onlyis True, then symbols will not be parsed from the import table and the entries in theIMAGE_DIRECTORY_ENTRY_IMPORTattribute will not have asymbolsattribute.
-
parse_debug_directory(rva, size)¶
-
parse_delay_import_directory(rva, size)¶ Walk and parse the delay import directory.
-
parse_directory_bound_imports(rva, size)¶
-
parse_directory_load_config(rva, size)¶
-
parse_directory_tls(rva, size)¶
-
parse_export_directory(rva, size, forwarded_only=False)¶ Parse the export directory.
Given the RVA of the export directory, it will process all its entries.
The exports will be made available as a list of
ExportDatainstances in theIMAGE_DIRECTORY_ENTRY_EXPORTPE attribute.
-
parse_import_directory(rva, size, dllnames_only=False)¶ Walk and parse the import directory.
-
parse_imports(original_first_thunk, first_thunk, forwarder_chain, max_length=None)¶ Parse the imported symbols.
It will fill a list, which will be available as the dictionary attribute
imports. Its keys will be the DLL names and the values of all the symbols imported from that object.
-
parse_relocations(data_rva, rva, size)¶
-
parse_relocations_directory(rva, size)¶
-
parse_resource_data_entry(rva)¶ Parse a data entry from the resources directory.
-
parse_resource_entry(rva)¶ Parse a directory entry from the resources directory.
-
parse_resources_directory(rva, size=0, base_rva=None, level=0, dirs=None)¶ Parse the resources directory.
Given the RVA of the resources directory, it will process all its entries.
The root will have the corresponding member of its structure,
IMAGE_RESOURCE_DIRECTORYplusentries, a list of all the entries in the directory.Those entries will have, correspondingly, all the structure’s members (
IMAGE_RESOURCE_DIRECTORY_ENTRY) and an additional one,directory, pointing to theIMAGE_RESOURCE_DIRECTORYstructure representing upper layers of the tree. This one will also have anentriesattribute, pointing to the third, and last, level. Another directory with more entries. Those last entries will have a new attribute (bothleafordata_entrycan be used to access it). This structure finally points to the resource data. All the members of this structure,IMAGE_RESOURCE_DATA_ENTRY, are available as its attributes.
-
parse_rich_header()¶ Parses the rich header. See Microsoft’s Rich Signature for more information.
Structure:
00 DanS ^ checksum, checksum, checksum, checksum 10 Symbol RVA ^ checksum, Symbol size ^ checksum... ... XX Rich, checksum, 0, 0,...
-
parse_sections(offset)¶ Fetch the PE file sections.
The sections will be readily available in the
sectionsattribute. Its attributes will contain all the section information plusdata, a buffer containing the section’s data.The
characteristicsmember will be processed and attributes representing the section characteristics (withIMAGE_SCN_trimmed from the constants’ names) will be added to the section instance.Refer to the
SectionStructureclass for additional info.
-
parse_version_information(version_struct)¶ Parse version information structure.
The date will be made available in three attributes of the PE object:
- VS_VERSIONINFO
- Contains the first three fields of the main structure:
Length,ValueLength, andType. - VS_FIXEDFILEINFO
- Jold the rest of the fields, accessible as sub-attributes:
Signature,StrucVersion,FileVersionMS,FileVersionLS,ProductVersionMS,ProductVersionLS,FileFlagsMask,FileFlags,FileOS,FileType,FileSubtype,FileDateMS,FileDateLS. - FileInfo
List of all
StringFileInfoandVarFileInfostructures.StringFileInfostructures will have a list as an attribute namedStringTablecontaining all theStringTablestructures. Each of those structures contains a dictionaryentrieswith all the key/value version information string pairs.VarFileInfostructures will have a list as an attribute namedVarcontaining allVarstructures. EachVarstructure will have a dictionary as an attribute namedentrywhich will contain the name and value of theVar.
-
print_info(encoding='utf-8')¶ Print all the PE header information in a human readable from.
-
relocate_image(new_ImageBase)¶ Apply the relocation information to the image using the provided new image base.
This method will apply the relocation information to the image. Given the new base, all the relocations will be processed and both the raw data and the section’s data will be fixed accordingly.
The resulting image can be retrieved by the
get_memory_mapped_image()method as well.In order to get something that would more closely match what could be found in memory once the Windows loader finished its work.
-
set_bytes_at_offset(offset, data)¶ Overwrite the bytes at the given file offset with the given string.
Return True if successful, False otherwise. It can fail if the offset is outside the file’s boundaries.
-
set_bytes_at_rva(rva, data)¶ Overwrite, with the given string, the bytes at the file offset corresponding to the given RVA.
Return True if successful, False otherwise. It can fail if the offset is outside the file’s boundaries.
-
set_dword_at_offset(offset, dword)¶ Set the double word value at the given file offset.
-
set_dword_at_rva(rva, dword)¶ Set the double word value at the file offset corresponding to the given RVA.
-
set_qword_at_offset(offset, qword)¶ Set the quad-word value at the given file offset.
-
set_qword_at_rva(rva, qword)¶ Set the quad-word value at the file offset corresponding to the given RVA.
-
set_word_at_offset(offset, word)¶ Set the word value at the given file offset.
-
set_word_at_rva(rva, word)¶ Set the word value at the file offset corresponding to the given RVA.
-
show_warnings()¶ Print the list of warnings.
Non-critical problems found when parsing the PE file are appended to a list of warnings. This method prints the full list to standard output.
-
trim()¶ Return the just data defined by the PE headers, removing any overlayed data.
-
verify_checksum()¶
-
write(filename=None)¶ Write the PE file.
This function will process all headers and components of the PE file and include all changes made (by just assigning to attributes in the PE objects) and write the changes back to a file whose name is provided as an argument. The filename is optional, if not provided the data will be returned as a string.
-
exception
pefile.PEFormatError(value)¶ Generic PE format error exception.
-
class
pefile.RelocationData(**args)¶ Holds relocation information.
- type
- Type of relocation. The type string can be obtained
by
RELOCATION_TYPE[type]. - rva
- RVA of the relocation.
-
class
pefile.ResourceDataEntryData(**args)¶ Holds resource data entry information.
- struct
IMAGE_RESOURCE_DATA_ENTRYstructure.- lang
- Primary language ID.
- sublang
- Sublanguage ID.
-
class
pefile.ResourceDirData(**args)¶ Holds resource directory information.
- struct
IMAGE_RESOURCE_DIRECTORYstructure.- entries
- List of entries as
ResourceDirEntryDatainstances.
-
class
pefile.ResourceDirEntryData(**args)¶ Holds resource directory entry data:
- struct
IMAGE_RESOURCE_DIRECTORY_ENTRYstructure.- name
- If the resource is identified by name, then this attribute will contain the name string.
None otherwise. If identified by ID, then the ID is available at
struct.Id. - id
- The id, also in
struct.Id. - directory
- If this entry has a lower-level directory, then this attribute will point to the
ResourceDirDatainstance representing it. - data
- If this entry has no further lower-level directories, and points to the
actual resource data, then this attribute will reference the corresponding
ResourceDataEntryDatainstance.
Either of the
directoryordataattribute will exist, but not both.
-
class
pefile.SectionStructure(*argl, **argd)¶ Convenience section-handling class.
-
contains(rva)¶ Deprecated.
Use
contains_rva()instead.
-
contains_offset(offset)¶ Check whether the section contains the file offset provided.
-
contains_rva(rva)¶ Check whether the section contains the given address.
-
entropy_H(data)¶ Calculate the entropy of a chunk of data.
-
get_data(start=None, length=None)¶ Get data chunk from a section.
Query data from the section by passing the addresses where the PE file would be loaded by default. It is then possible to retrieve code and data by their real addresses as they would be if loaded.
Returns
bytes()for Python 3.x andset()for Python 2.7.
-
get_entropy()¶ Calculate and return the entropy of the section.
-
get_hash_md5()¶ Get the MD5 hex-digest of the section’s data.
-
get_hash_sha1()¶ Get the SHA-1 hex-digest of the section’s data.
-
get_hash_sha256()¶ Get the SHA-256 hex-digest of the section’s data.
-
get_hash_sha512()¶ Get the SHA-512 hex-digest of the section’s data.
-
get_offset_from_rva(rva)¶
-
get_rva_from_offset(offset)¶
-
-
class
pefile.Structure(format, name=None, file_offset=None)¶ Prepare structure object to extract members from data.
Format is a list containing definitions for the elements of the structure.
-
all_zeroes()¶ Returns True if the unpacked data is all zeros.
-
dump(indentation=0)¶ Returns a string representation of the structure.
-
dump_dict()¶ Returns a dictionary representation of the structure.
-
get_field_absolute_offset(field_name)¶ Return the offset within the field for the requested field in the structure.
-
get_field_relative_offset(field_name)¶ Return the offset within the structure for the requested field.
-
sizeof()¶ Returns the size of the structure.
-
-
class
pefile.TlsData(**args)¶ Holds TLS information.
- struct
IMAGE_TLS_DIRECTORYstructure.
-
class
pefile.UnicodeStringWrapperPostProcessor(pe, rva_ptr)¶ Attempt to identify strings in plain Unicode or Pascal.
A list of strings will be wrapped on the object with the hope that any overlapping will clarify its type.
-
ask_unicode_16(next_rva_ptr)¶ The next RVA is taken to be the one immediately following this one.
Such RVA could indicate the natural end of the string and will be checked to see if there’s a Unicode NULL character there.
-
decode(*args)¶
-
get_pascal_16_length()¶
-
get_rva()¶ Get the RVA of the string.
-
invalidate()¶ Make this instance None to express that it’s no known string type.
-
render_pascal_16()¶
-
-
pefile.is_valid_dos_filename(s)¶
-
pefile.is_valid_function_name(s)¶
-
pefile.retrieve_flags(flag_dict, flag_filter)¶ Read the flags from a dictionary and return them in a usable format.
Will return a list of (flag, value) pairs for all flags in
flag_dictmatching the filterflag_filter.
-
pefile.set_flags(obj, flag_field, flags)¶ Process the flags and set attributes in the object accordingly.
The object
objwill gain attributes named after the flags provided inflagsand valued True/False, matching the results of applying each flag value fromflagstoflag_field.