EVC's data collection, extraction, and conversion experts have the ability to identify and collect structured, semi-structured, and unstructured data from the Internet to meet your project requirements

Data Collection, Extraction, Conversion

Data Collection, extraction, and conversion of a wide-variety of internet data to meet your needs

Enterprise Venture Corporation’s (EVC’s) Open Source Intelligence (OSINT) and collection experts identify and collect structured, semi-structured, and unstructured data from the Internet. Our team collects and converts at an average rate of 12,000 records per hour, depending on quality of the source data. In addition, our team employs a custom extraction tool for unstructured records. Utilizing a Human-in-the-Loop process to verify fields and content, our approach is rapidly modified to adapt to changes in the data format or content.

Data Collection and Conversion

Using proven screen scraping technologies, we customize strategies for collecting data from electronic resources. Once collected, the data is analyzed for format and quality identifying steps for data conversion. This includes normalizing phone numbers and addresses to the client’s requirements. We extract elements such as email address or phone number from fields such as “Comments” to populate appropriate fields, or build relationships between elements such as “subsidiary” or “owned by.” This process is applied to legacy files preparing historical data for new database formats or improved data mining activities.

Unstructured Data Extraction

Frequently, data may be contained in documents not suitable for Natural Language Processing (NLP). Examples include  documents with sections in uppercase letters, or limited punctuation or grammar. Our Human-in-the-Loop process handles a wide variety of document contents and formats. Our customized extraction scripts ensure the elements critical to the client are identified. We then develop appropriate extraction routines. Once elements are initially tagged, our data conversion team reviews for accuracy of field content or tagging. For example, is “Denver” the name of a city or the name of a person? Our team reviews context in order to tag and format the data appropriately. The results are formatted to feed into other analytical tools, such as Palantir, or into an existing database.