pefile

Portable Executable Reader Module

All of the basic PE file structures are available with their default names as attributes of the instance returned.

Processed elements, such as the import table, are available with lowercase names to differentiate them from the uppercase basic-structure names.

pefile has been tested against many edge cases, such as corrupted and malformed PEs, as well as malware, which often attempts to abuse the format beyond its intended use. To the best of my knowledge, most of the abuse is handled gracefully.

class pefile.BaseRelocationData(**args)

Holds base relocation information.

struct
IMAGE_BASE_RELOCATION structure.
entries
List of relocation data as RelocationData instances.
class pefile.BoundImportDescData(**args)

Holds bound import descriptor data.

This directory entry will provide information on the DLLs this PE file has been bound to (if bound at all). The structure will contain the name and timestamp of the DLL at the time of binding so that the loader can know whether it differs from the one currently present in the system and must, therefore, re-bind the PE’s imports.

struct
IMAGE_BOUND_IMPORT_DESCRIPTOR structure.
name
DLL name.
entries
List of entries as BoundImportRefData instances. The entries will exist if this DLL has forwarded symbols. If so, the destination DLL will have an entry in this list.
class pefile.BoundImportRefData(**args)

Holds bound import forwarder reference data.

Contains the same information as the bound descriptor but for forwarded DLLs, if any.

struct
IMAGE_BOUND_FORWARDER_REF structure.
name
DLL name.
class pefile.DataContainer(**args)

Generic data container.

class pefile.DebugData(**args)

Holds debug information.

struct
IMAGE_DEBUG_DIRECTORY structure.
entries
List of entries as IMAGE_DEBUG_TYPE instances.
class pefile.Dump

Convenience class for dumping the PE information.

add(txt, indent=0)

Adds some text, no newline will be appended.

The text can be indented with the optional indent argument.

add_header(txt)

Adds a header element.

add_line(txt, indent=0)

Adds a line.

The line can be indented with the optional indent argument.

add_lines(txt, indent=0)

Adds a list of lines.

The list can be indented with the optional indent argument.

add_newline()

Adds a newline.

get_text()

Get the text in its current state.

class pefile.ExportData(**args)

Holds exported symbols’ information.

ordinal
Ordinal of the symbol.
address
Address of the symbol.
name
Name of the symbol; None if the symbol is exported by ordinal only.
forwarder
If the symbol is forwarded, it will contain the name of the target symbol, otherwise None.
class pefile.ExportDirData(**args)

Holds export directory information.

struct
IMAGE_EXPORT_DIRECTORY structure.
symbols
List of exported symbols as ExportData instances.
class pefile.ImportData(**args)

Holds imported symbol’s information.

ordinal
Ordinal of the symbol.
name
Name of the symbol.
bound
If the symbol is bound, this contains the address.
class pefile.ImportDescData(**args)

Holds import descriptor information.

dll
Name of the imported DLL.
imports
List of imported symbols as ImportData instances.
struct
IMAGE_IMPORT_DESCRIPTOR structure.
class pefile.LoadConfigData(**args)

Holds Load Config data.

struct
IMAGE_LOAD_CONFIG_DIRECTORY structure.
name
DLL name.
class pefile.PE(name=None, data=None, fast_load=None, max_symbol_exports=8192)

A Portable Executable representation.

This class provides access to most of the information in a PE file.

It expects to be supplied the name of the file to load, or PE data to process and an optional argument fast_load (None by default), which controls whether to load all the directories information, which can be quite time consuming.

Three ways to load and process module.dll:

  1. pe = pefile.PE('module.dll')
  2. pe = pefile.PE(name='module.dll')

If the data is already available in a buffer, the same can be achieved with:

  1. pe = pefile.PE(data=module_dll_data)

The fast_load argument can be set to a default by setting its value in the module like this: pefile.fast_load = True. That will make all the subsequent instances not to load the whole PE structure. The full_load() method can be used to parse the missing data at a later stage.

Basic headers information will be available in these attributes:

  • DOS_HEADER
  • NT_HEADERS
  • FILE_HEADER
  • OPTIONAL_HEADER

All of them will contain among their attributes the members of the corresponding structures as defined in WINNT.H.

The raw data corresponding to the header (from the beginning of the file up to the start of the first section) will be available in the instance’s attribute header as a string.

The sections will be available as a list in the sections attribute. Each entry will contain as attributes all the structure’s members.

Directory entries will be available as attributes (if they exist). No other entries are processed at this point.

The following dictionary attributes provide ways of mapping different constants. They will accept the numeric value and return the string representation and the opposite, feed in the string and get the numeric constant:

  • DIRECTORY_ENTRY
  • IMAGE_CHARACTERISTICS
  • SECTION_CHARACTERISTICS
  • DEBUG_TYPE
  • SUBSYSTEM_TYPE
  • MACHINE_TYPE
  • RELOCATION_TYPE
  • RESOURCE_TYPE
  • LANG
  • SUBLANG
dump_dict(dump=None)

Dump all the PE header information into a dictionary.

dump_info(dump=None, encoding='ascii')

Dump all the PE header information into human readable string.

dword_align(offset, base)
full_load()

Process the data directories.

This method will load the data directories which might not have been loaded if the fast_load option was used.

generate_checksum()
get_bytes_from_data(offset, data)
get_data(rva=0, length=None)

Get data regardless of its section.

Given a RVA and the size of the chunk to retrieve, this method will find the section where the data lies and return the data.

get_data_from_dword(dword)

Return a four byte string representing the double word value (little endian).

get_data_from_qword(word)

Return an eight byte string representing the quad-word value (little endian).

get_data_from_word(word)

Return a two byte string representing the word value (little endian).

get_dword_at_rva(rva)

Return the double word value at the given RVA.

Returns None if the value can’t be read (i.e., the RVA can’t be mapped to a file offset).

get_dword_from_data(data, offset)

Convert four bytes of data to a double word (little endian).

The offset is assumed to index into a dword array. So setting it to N will return a dword out of the data starting at offset N*4.

Returns None if the data can’t be turned into a double word.

get_dword_from_offset(offset)

Return the double word value at the given file offset (little endian).

get_imphash()
get_import_table(rva, max_length=None)
get_memory_mapped_image(max_virtual_address=268435456, ImageBase=None)

Returns the data corresponding to the memory layout of the PE file.

The data includes the PE header and the sections loaded at offsets corresponding to their relative virtual addresses (the VirtualAddress section header member). Any offset in this data corresponds to the absolute memory address ImageBase+offset.

The optional argument max_virtual_address provides with means of limiting which sections are processed. Any section with their VirtualAddress beyond this value will be skipped. Normally, sections with values beyond this range are just there to confuse tools. It’s a common trick to see in packed executables.

If the optional ImageBase argument is supplied, the file’s relocations will be applied to the image by calling the relocate_image() method. Beware that the relocation information is applied permanently.

get_offset_from_rva(rva)

Get the file offset corresponding to this RVA.

Given a RVA , this method will find the section where the data lies and return the offset within the file.

get_overlay()

Get the data appended to the file and not contained within the area described in the headers.

get_overlay_data_start_offset()

Get the offset of data appended to the file and not contained within the area described in the headers.

get_physical_by_rva(rva)

Gets the physical address in the PE file from an RVA value.

get_qword_at_rva(rva)

Return the quad-word value at the given RVA.

Returns None if the value can’t be read (i.e., the RVA can’t be mapped to a file offset).

get_qword_from_data(data, offset)

Convert eight bytes of data to a word (little endian).

The offset is assumed to index into a word array. So setting it to N will return a dword out of the data starting at offset N*8.

Returns None if the data can’t be turned into a quad word.

get_qword_from_offset(offset)

Return the quad-word value at the given file offset (little endian).

get_resources_strings()

Returns a list of all the strings found withing the resources (if any).

This method will scan all entries in the resources directory of the PE, if there is one, and will return a list of the strings.

An empty list will be returned otherwise.

get_rva_from_offset(offset)

Get the RVA corresponding to this file offset.

get_section_by_offset(offset)

Get the section containing the given file offset.

get_section_by_rva(rva)

Get the section containing the given address.

get_string_at_rva(rva, max_length=1048576)

Get an ASCII string located at the given address.

get_string_from_data(offset, data)

Get an ASCII string from data.

get_string_u_at_rva(rva, max_length=65536, encoding=None)

Get a Unicode string located at the given address.

get_warnings()

Return the list of warnings.

Non-critical problems found when parsing the PE file are appended to a list of warnings. This method returns the full list.

get_word_at_rva(rva)

Return the word value at the given RVA.

Returns None if the value can’t be read (i.e., the RVA can’t be mapped to a file offset).

get_word_from_data(data, offset)

Convert two bytes of data to a word (little endian).

The offset is assumed to index into a word array. So setting it to N will return a dword out of the data starting at offset N*2.

Returns None if the data can’t be turned into a word.

get_word_from_offset(offset)

Return the word value at the given file offset (little endian).

has_relocs()

Checks if the PE file has relocation directory

is_dll()

Check whether the file is a standard DLL.

This will return true only if the image has the IMAGE_FILE_DLL flag set.

is_driver()

Check whether the file is a Windows driver.

This will return true only if there are reliable indicators of the image being a driver.

is_exe()

Check whether the file is a standard executable.

This will return true only if the file has the IMAGE_FILE_EXECUTABLE_IMAGE flag set and the IMAGE_FILE_DLL not set and the file does not appear to be a driver either.

merge_modified_section_data()

Update the PE image content with any individual section data that has been modified.

parse_data_directories(directories=None, forwarded_exports_only=False, import_dllnames_only=False)

Parse and process the PE file’s data directories.

If the optional directories argument is provided, only the directories at the specified indexes will be parsed. Such functionality allows parsing of areas of interest without the burden of having to parse all others. The directories can then be specified as:

For export / import only:

directories = [ 0, 1 ]

or (more verbosely):

directories = [ DIRECTORY_ENTRY['IMAGE_DIRECTORY_ENTRY_IMPORT'],
    DIRECTORY_ENTRY['IMAGE_DIRECTORY_ENTRY_EXPORT'] ]

If directories is a list, the ones that are processed will be removed, leaving only the ones that are not present in the image.

If forwarded_exports_only is True, then the IMAGE_DIRECTORY_ENTRY_EXPORT attribute will only contain exports that are forwarded to another DLL.

If import_dllnames_only is True, then symbols will not be parsed from the import table and the entries in the IMAGE_DIRECTORY_ENTRY_IMPORT attribute will not have a symbols attribute.

parse_debug_directory(rva, size)
parse_delay_import_directory(rva, size)

Walk and parse the delay import directory.

parse_directory_bound_imports(rva, size)
parse_directory_load_config(rva, size)
parse_directory_tls(rva, size)
parse_export_directory(rva, size, forwarded_only=False)

Parse the export directory.

Given the RVA of the export directory, it will process all its entries.

The exports will be made available as a list of ExportData instances in the IMAGE_DIRECTORY_ENTRY_EXPORT PE attribute.

parse_import_directory(rva, size, dllnames_only=False)

Walk and parse the import directory.

parse_imports(original_first_thunk, first_thunk, forwarder_chain, max_length=None)

Parse the imported symbols.

It will fill a list, which will be available as the dictionary attribute imports. Its keys will be the DLL names and the values of all the symbols imported from that object.

parse_relocations(data_rva, rva, size)
parse_relocations_directory(rva, size)
parse_resource_data_entry(rva)

Parse a data entry from the resources directory.

parse_resource_entry(rva)

Parse a directory entry from the resources directory.

parse_resources_directory(rva, size=0, base_rva=None, level=0, dirs=None)

Parse the resources directory.

Given the RVA of the resources directory, it will process all its entries.

The root will have the corresponding member of its structure, IMAGE_RESOURCE_DIRECTORY plus entries, a list of all the entries in the directory.

Those entries will have, correspondingly, all the structure’s members (IMAGE_RESOURCE_DIRECTORY_ENTRY) and an additional one, directory, pointing to the IMAGE_RESOURCE_DIRECTORY structure representing upper layers of the tree. This one will also have an entries attribute, pointing to the third, and last, level. Another directory with more entries. Those last entries will have a new attribute (both leaf or data_entry can be used to access it). This structure finally points to the resource data. All the members of this structure, IMAGE_RESOURCE_DATA_ENTRY, are available as its attributes.

parse_rich_header()

Parses the rich header. See Microsoft’s Rich Signature for more information.

Structure:

00 DanS ^ checksum, checksum, checksum, checksum
10 Symbol RVA ^ checksum, Symbol size ^ checksum...
...
XX Rich, checksum, 0, 0,...
parse_sections(offset)

Fetch the PE file sections.

The sections will be readily available in the sections attribute. Its attributes will contain all the section information plus data, a buffer containing the section’s data.

The characteristics member will be processed and attributes representing the section characteristics (with IMAGE_SCN_ trimmed from the constants’ names) will be added to the section instance.

Refer to the SectionStructure class for additional info.

parse_version_information(version_struct)

Parse version information structure.

The date will be made available in three attributes of the PE object:

VS_VERSIONINFO
Contains the first three fields of the main structure: Length, ValueLength, and Type.
VS_FIXEDFILEINFO
Jold the rest of the fields, accessible as sub-attributes: Signature, StrucVersion, FileVersionMS, FileVersionLS, ProductVersionMS, ProductVersionLS, FileFlagsMask, FileFlags, FileOS, FileType, FileSubtype, FileDateMS, FileDateLS.
FileInfo

List of all StringFileInfo and VarFileInfo structures.

StringFileInfo structures will have a list as an attribute named StringTable containing all the StringTable structures. Each of those structures contains a dictionary entries with all the key/value version information string pairs.

VarFileInfo structures will have a list as an attribute named Var containing all Var structures. Each Var structure will have a dictionary as an attribute named entry which will contain the name and value of the Var.

print_info(encoding='utf-8')

Print all the PE header information in a human readable from.

relocate_image(new_ImageBase)

Apply the relocation information to the image using the provided new image base.

This method will apply the relocation information to the image. Given the new base, all the relocations will be processed and both the raw data and the section’s data will be fixed accordingly.

The resulting image can be retrieved by the get_memory_mapped_image() method as well.

In order to get something that would more closely match what could be found in memory once the Windows loader finished its work.

set_bytes_at_offset(offset, data)

Overwrite the bytes at the given file offset with the given string.

Return True if successful, False otherwise. It can fail if the offset is outside the file’s boundaries.

set_bytes_at_rva(rva, data)

Overwrite, with the given string, the bytes at the file offset corresponding to the given RVA.

Return True if successful, False otherwise. It can fail if the offset is outside the file’s boundaries.

set_dword_at_offset(offset, dword)

Set the double word value at the given file offset.

set_dword_at_rva(rva, dword)

Set the double word value at the file offset corresponding to the given RVA.

set_qword_at_offset(offset, qword)

Set the quad-word value at the given file offset.

set_qword_at_rva(rva, qword)

Set the quad-word value at the file offset corresponding to the given RVA.

set_word_at_offset(offset, word)

Set the word value at the given file offset.

set_word_at_rva(rva, word)

Set the word value at the file offset corresponding to the given RVA.

show_warnings()

Print the list of warnings.

Non-critical problems found when parsing the PE file are appended to a list of warnings. This method prints the full list to standard output.

trim()

Return the just data defined by the PE headers, removing any overlayed data.

verify_checksum()
write(filename=None)

Write the PE file.

This function will process all headers and components of the PE file and include all changes made (by just assigning to attributes in the PE objects) and write the changes back to a file whose name is provided as an argument. The filename is optional, if not provided the data will be returned as a string.

exception pefile.PEFormatError(value)

Generic PE format error exception.

class pefile.RelocationData(**args)

Holds relocation information.

type
Type of relocation. The type string can be obtained by RELOCATION_TYPE[type].
rva
RVA of the relocation.
class pefile.ResourceDataEntryData(**args)

Holds resource data entry information.

struct
IMAGE_RESOURCE_DATA_ENTRY structure.
lang
Primary language ID.
sublang
Sublanguage ID.
class pefile.ResourceDirData(**args)

Holds resource directory information.

struct
IMAGE_RESOURCE_DIRECTORY structure.
entries
List of entries as ResourceDirEntryData instances.
class pefile.ResourceDirEntryData(**args)

Holds resource directory entry data:

struct
IMAGE_RESOURCE_DIRECTORY_ENTRY structure.
name
If the resource is identified by name, then this attribute will contain the name string. None otherwise. If identified by ID, then the ID is available at struct.Id.
id
The id, also in struct.Id.
directory
If this entry has a lower-level directory, then this attribute will point to the ResourceDirData instance representing it.
data
If this entry has no further lower-level directories, and points to the actual resource data, then this attribute will reference the corresponding ResourceDataEntryData instance.

Either of the directory or data attribute will exist, but not both.

class pefile.SectionStructure(*argl, **argd)

Convenience section-handling class.

contains(rva)

Deprecated.

Use contains_rva() instead.

contains_offset(offset)

Check whether the section contains the file offset provided.

contains_rva(rva)

Check whether the section contains the given address.

entropy_H(data)

Calculate the entropy of a chunk of data.

get_data(start=None, length=None)

Get data chunk from a section.

Query data from the section by passing the addresses where the PE file would be loaded by default. It is then possible to retrieve code and data by their real addresses as they would be if loaded.

Returns bytes() for Python 3.x and set() for Python 2.7.

get_entropy()

Calculate and return the entropy of the section.

get_hash_md5()

Get the MD5 hex-digest of the section’s data.

get_hash_sha1()

Get the SHA-1 hex-digest of the section’s data.

get_hash_sha256()

Get the SHA-256 hex-digest of the section’s data.

get_hash_sha512()

Get the SHA-512 hex-digest of the section’s data.

get_offset_from_rva(rva)
get_rva_from_offset(offset)
class pefile.Structure(format, name=None, file_offset=None)

Prepare structure object to extract members from data.

Format is a list containing definitions for the elements of the structure.

all_zeroes()

Returns True if the unpacked data is all zeros.

dump(indentation=0)

Returns a string representation of the structure.

dump_dict()

Returns a dictionary representation of the structure.

get_field_absolute_offset(field_name)

Return the offset within the field for the requested field in the structure.

get_field_relative_offset(field_name)

Return the offset within the structure for the requested field.

sizeof()

Returns the size of the structure.

class pefile.TlsData(**args)

Holds TLS information.

struct
IMAGE_TLS_DIRECTORY structure.
class pefile.UnicodeStringWrapperPostProcessor(pe, rva_ptr)

Attempt to identify strings in plain Unicode or Pascal.

A list of strings will be wrapped on the object with the hope that any overlapping will clarify its type.

ask_unicode_16(next_rva_ptr)

The next RVA is taken to be the one immediately following this one.

Such RVA could indicate the natural end of the string and will be checked to see if there’s a Unicode NULL character there.

decode(*args)
get_pascal_16_length()
get_rva()

Get the RVA of the string.

invalidate()

Make this instance None to express that it’s no known string type.

render_pascal_16()
pefile.is_valid_dos_filename(s)
pefile.is_valid_function_name(s)
pefile.retrieve_flags(flag_dict, flag_filter)

Read the flags from a dictionary and return them in a usable format.

Will return a list of (flag, value) pairs for all flags in flag_dict matching the filter flag_filter.

pefile.set_flags(obj, flag_field, flags)

Process the flags and set attributes in the object accordingly.

The object obj will gain attributes named after the flags provided in flags and valued True/False, matching the results of applying each flag value from flags to flag_field.