--- changes per libwhisker release ------------------------------------- [] libwhisker 2.1 - Sullo pointed out that $LW_HAS_SSL has disappeared. Forgot to document that. Use $LW_SSL_LIB instead. - Changed a (len!=0) to (len>0) check in the chunk decoder, to be more robust. - added html_link_extractor() function, which uses code already present in the crawl module. - The regex was a bit broken in encode_uri_randomhex(). Pointed out by John McDonald. - John also found a typo in encode_anti_ids(), causing it to call the non-existant function encode_randomase(). - New Makefile.pl build environment. - Bug in forms_read() and _forms_callback() which prevented the proper storage of multiple forms. ---------------------------------------------------------------------------- [] libwhisker 2.0 - Libwhisker 2.0 is officially dubbed LW2. Below are the incompatible changes from libwhisker 1.x. There were lots of general changes, but only the non-backwards-compatible ones are documented. - Following were renamed: {whisker}->{req_spacer*} => {whisker}->{http_space*} {whisker}->{http_ver} => {whisker}->{version} {whisker}->{http_protocol} => {whisker]->{protocol} {whisker}->{uri_param} => {whisker}->{parameters} {whisker}->{recv_header_order} => {whisker}->{header_order} {whisker}->{http_resp_message} => {whisker}->{message} {whisker}->{INITIAL_MAGIC} => {whisker}->{MAGIC} {whisker}->{sockstate} => {whisker}->{socket_state} utils_lowercase_(hashkeys|headers) => utils_lowercase_keys utils_split_uri => uri_split utils_join_uri => uri_join utils_normalize_uri => uri_normalize utils_absolute_uri => uri_absolute utils_get_dir => uri_get_dir utils_unidecode_uri => decode_unicode anti_ids => encode_anti_ids bruteurl => utils_bruteurl auth_set_header => auth_set encode_str2uri => encode_uri_hex encode_str2ruri => encode_uri_randomhex dumper => dump dumper_writefile => dump_writefile - Following are now depreciated (along with their functionality): {whisker}->{method_postfix} {whisker}->{http_req_trailer} {whisker}->{queue_md5} (use {request_fingerprint}) {whisker}->{http_resp} (use {code}) {whisker}->{retry_errors} {whisker}->{ids_session_splice} do_auth (use auth_set) upload_file download_file (use get_page_to_file) md5_perl (use md5) md4_perl (use md4) (en|de)code_base64_perl (use (en|de)code_base64) crawl_get_config crawl_set_config - {whisker}->{parameters} will not be included if it's an empty string - {whisker}->{normalize_incoming_headers} now changes AA-Bb-cc-dD to Aa-Bb-Cc-Dd, instead of the prior AA-Bb-Cc-DD. - Invalid HTTP response error message does not include invalid response (but it's still in {whisker}->{data}) - IDS session splicing is depreciated. Most IDSes do stream reassembly anyways, so this is not a big loss. The depreciation is due to limitations of the current stream implementation. It will reappear in future versions. - cookie_* now operates independantly of the actual set-cookie header. http_do_request now has internal magic, so that all cookies are saved and processed regardless of header capitalization, normalization, and duplication (including the default ignore_duplicate_headers). - Lots of the global variables were changed/renamed or removed. See globals.pl for details. - Crawl was completely rebuilt to be more object-ish (the use of so many global variables made it hard to have multiple crawl sessions going at once). If you were using crawl(), then you will need to review the new way of calling crawl() and accessing related data. All the crawl data structures (and locations) were changed, as were the format for configuring the crawler and callbacks. - Dumper() returns undef on error, instead of the string 'ERROR'. - html_find_tags() takes a few more optional parameters. Using a tag map can lead to speed increases by reducing the amount of times the callback function is actually called. - The libwhisker 1.x series did not properly generate forms structures (via forms_read()). It was corrected, but the generated structure, while now accurate per documention, is not backwards-compatible. - Authorization is now handled via auth_set(), and not merely by the presence of the Authorization header. Also, the internal {whisker}->{ntlm_*} keys relating to NTLM authentication have been deleted. You shouldn't have been using them anyway. :) - Socket timeout values are read from {whisker}->{timeout}, and are saved per stream. The global $TIMEOUT variable no longer exists. - HTML rewriting via html_find_tags() is now done by calling html_find_tags_rewrite() within our callback function. The return value of the callback is ignored (and thus not required, unlike LW1.x). - auth_set() will now call http_reset() whenever any NTLM-based authentication is used. This is because NTLM is a connection-based authentication, and thus all connections need to start from scratch when NTLM is enabled. - The ETag header is now normalized to ETag, and not Etag. - All new POD documentation, which follows the more standard format for use with pod2man. - utils_find_lowercase_keys() will now dereference multi-value entries and return a full array if it is called in array context. - A bug in Crypt::SSLeay (Net::SSL) 0.51 (and probably prior) causes it to puke when it is used in proxy mode. Hopefully it will be fixed in future versions. - Turns out the Net::SSLeay implementation of MD5 was returning bad hashes (it truncated them at the first NULL byte). Use of Net::SSLeay::md5 has been discontinued permanently. Use $LW_SSL_LIB instead.