在 javascript 中用时区解析任何日期

Parse any date with timezone in javascript

提问人:rolling_codes 提问时间:4/23/2023 最后编辑:rolling_codes 更新时间:5/21/2023 访问量:186

问:

我一直在寻找现有的解决方案,但还没有找到可以处理此类方案的方法:我希望能够解析可能格式错误的日期。这些日期是从全球各地的网页上抓取的,不幸的是,其中只有一半足以提供格式正确的 ISO 字符串。

对于以下示例,我唯一的选择是使用正则表达式解析输入,然后重建日期吗?

// invalid date
new Date('Apr 21, 2023,06:51 pm EDT');

// invalid date
new Date('Apr 21, 2023 06:51pm EDT');

// invalid date
new Date('Apr 21, 2023 06:51 p.m EDT');

// invalid date
new Date('Apr 21, 2023 06.51 pm EDT'); 

// invalid date
new Date('Apr 21, 2023 06:51 pm ET'); 

// invalid date
new Date('06:51 pm Apr 21, 2023 EDT'); 

// invalid date
new Date('Apr 21st 2023 EDT'); 

// invalid date
new Date('7th Feb 2023');

// invalid date
new Date('3 hours ago');

// invalid date
new Date('10m ago');

// invalid date (haven't solved this one yet)
new Date('Thursday 10:30PM EST');

我目前的解决方案是首先检查字符串中的日期数字 + 可选时区,然后是时间数字 + 可选时区,最后以 ISO 格式重建日期字符串。对于似乎非常普遍的要求,这似乎是很多样板。有没有更简单的方法,可以做什么?

import ms from 'ms';

export const TIME_EXPR = /(\d\d?)\s*(a\.?m\.?|p\.?m\.?)|(\d\d?)[.:](\d\d?)(?:[.:](\d\d?))?(?:\s*(a\.?m\.?|p\.?m\.?))?(?:.*(?:\s*(ACDT|ACS?T|AES?T|AKDT|AKS?T|BS?T|CES?T|CDT|CS?T|EDT|ES?T|IS?T|JS?T|MDT|MSK|NZS?T|PDT|PS?T|UTC)))?/i;

export const DATE_EXPR = /(\d\d?\s*(?:h|h(?:ou)?rs?|m|min(?:ute)s?))\s*ago|(?:(\d\d?)([-./])(\d\d?)\3(\d{4}|\d{2})|(\d{4})([-./])(\d\d?)\7(\d\d?)|(?:(\d\d?)(?:st|nd|rd|th)?\s*)?(Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:t(?:ember)?)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)(?:\s+(\d\d?)(?:st|nd|rd|th)?)?(?:[,\s]\s*(\d{4})?))(?:.*(?:\s*(ACDT|ACS?T|AES?T|AKDT|AKS?T|BS?T|CES?T|CDT|CS?T|EDT|ES?T|IS?T|JS?T|MDT|MSK|NZS?T|PDT|PS?T|UTC)))?|(\d\d\d\d+)/i;

export function monthToString(month: number | string) {
  const m = parseInt(`${month}`);
  if (Number.isNaN(m)) {
    return month;
  }
  return new Date(`2023-${m}-07`).toLocaleString('en-US', { month: 'long' });
}

export function parseDate(context?: string) {
  if (!context) {
    return new Date('invalid');
  }
  const date = new Date(context.trim());
  if (!Number.isNaN(date.valueOf()) && date.valueOf() > 0) {
    return date;
  }
  const dateMatches = context.match(DATE_EXPR);
  const timeMatches = context.match(TIME_EXPR);
  let year = String(new Date().getFullYear());
  let month = monthToString(new Date().getMonth() + 1);
  let day = String(new Date().getDate());
  let hour = 0, min = 0, sec = 0, amOrPm = '';
  let timezone = '';
  if (!dateMatches) {
    return new Date('invalid');
  }
  const [_0, relative, month1, _3, day1, year1, year2, _7, month2, day2, day3, month3, day4, year3, tz = '', timestamp] = dateMatches;
  if (relative) {
    return new Date(Date.now() - ms(relative.replace(/h(?:ou)?rs?/, 'h').replace(/m(?:in)?s?/, 'm')));
  }
  const datetime = parseInt(timestamp);
  if (!Number.isNaN(datetime)) {
    if (!Number.isNaN(new Date(datetime)).valueOf()) {
      return new Date(datetime);
    }
  }
  timezone = tz;
  year = year1 ?? year2 ?? year3 ?? String(new Date().getFullYear());
  month = monthToString(month1 ?? month2 ?? month3 ?? new Date().getMonth() + 1);
  day = day1 ?? day2 ?? day3 ?? day4 ?? String(new Date().getDate());
  if (timeMatches) {
    const [_0, hour1, amOrPm1, hour2, min1, sec1, amOrPm2, timezone2] = timeMatches;
    hour = !Number.isNaN(parseInt(hour1 ?? hour2)) ? parseInt(hour1 ?? hour2) : 0;
    min = !Number.isNaN(parseInt(min1)) ? parseInt(min1) : 0;
    sec = !Number.isNaN(parseInt(sec1)) ? parseInt(sec1) : 0;
    amOrPm = (amOrPm1 ?? amOrPm2 ?? '').replace(/\./g, '');
    if (timezone2) {
      timezone = timezone2;
    }
  }
  const dateMatch = [`${month} ${day}, ${String(year).length === 2 ? `20${year}` : year} ${hour}:${min}:${sec} ${amOrPm ? amOrPm : (hour < 12) ? 'am' : 'pm'}`, timezone.replace(/^(A[CEK]|CE|NZ|[BCEIJP])T$/, ($0, $1) => `${$1}ST`)].join(' ');
  const parsedDate = new Date(dateMatch);
  return parsedDate;
}

export function sortDates(...dates: Date[]) {
  return [...dates].filter((d) => !Number.isNaN(d.valueOf())).sort((a, b) => { 
    return a.valueOf() - b.valueOf();
  });
}

export function minDate(...dates: Date[]) {
  const sortedDates = sortDates(...dates);
  if (sortedDates.length === 0) {
    return new Date('invalid');
  }
  return sortedDates[0];
}

export function maxDate(...dates: Date[]) {
  const sortedDates = sortDates(...dates);
  if (sortedDates.length === 0) {
    return new Date('invalid');
  }
  return sortedDates[sortedDates.length - 1];
}
JavaScript 正则表达式 TypeScript 日期 解析

评论

3赞 Pointy 4/23/2023
JavaScript 只能识别 ISO 标准日期。不同的实现也解析了其他格式,但这是平局的运气。使用库或编写自己的日期格式解析器。
1赞 rsp 4/23/2023
“我什至找不到可以处理此类场景的节点库”——你尝试了什么?
0赞 GarfieldKlon 4/23/2023
我会尝试在产生畸形日期的地方修复它。
1赞 Evert 4/23/2023
有很多日期处理库。Luxon 是一个很好的当前选择。

答:

-1赞 Dimava 4/23/2023 #1

你试过 https://www.npmjs.com/package/any-date-parser 吗?

解析各种日期格式,包括人工输入的日期。

any-date-parser 有一个 addFormat() 函数来添加自定义解析器。

支持的格式 24 小时时间 12 小时时间

时区偏移
时区缩写 年月日年名称 月日月日年月日年月日

月日






评论

0赞 RobG 4/23/2023
这应该是一个评论,而不是一个答案。
0赞 trincot 4/23/2023 #2

最好的方法是在源中以标准方式(ECMAScript 的日期时间字符串格式)格式化日期。如果不可能,那么你确实需要解析输入(或者让一个库为你做这件事)。

我有一个超级复杂的正则表达式......

也许你可以分步建立它,这样它就保持可管理性?或者,如果您决定让正则表达式执行数值验证,则可以删除这些验证,并将其留给 Date 构造函数处理。

以下是具有以下特征的可能实现:

  • 不注重验证;它慷慨地匹配超出范围的数字。
  • 允许组件之间的任何标点符号(任何匹配的标点符号)\W+)
  • 使用查找对象将已知时区代码映射到时区偏移量
  • 生成一个符合 ECMAScript 的日期时间字符串格式的字符串,条件是输入组件有效(在范围内)。
  • 让调用方将该日期时间字符串传递给 date 构造函数或传递给 。Date.parse

const zones = {aoe:'-12:00',y:'-12:00',nut:'-11:00',sst:'-11:00',x:'-11:00',ckt:'-10:00',hst:'-10:00',taht:'-10:00',w:'-10:00',mart:'-09:30',akst:'-09:00',gamt:'-09:00',hdt:'-09:00',v:'-09:00',akdt:'-08:00',pst:'-08:00',pst:'-08:00',u:'-08:00',mst:'-07:00',pdt:'-07:00',t:'-07:00',cst:'-06:00',east:'-06:00',galt:'-06:00',mdt:'-06:00',s:'-06:00',act:'-05:00',cdt:'-05:00',cist:'-05:00',cot:'-05:00',cst:'-05:00',easst:'-05:00',ect:'-05:00',est:'-05:00',pet:'-05:00',r:'-05:00',amt:'-04:00',ast:'-04:00',bot:'-04:00',cdt:'-04:00',cidst:'-04:00',clt:'-04:00',edt:'-04:00',fkt:'-04:00',gyt:'-04:00',pyt:'-04:00',q:'-04:00',vet:'-04:00',nst:'-03:30',adt:'-03:00',amst:'-03:00',art:'-03:00',brt:'-03:00',clst:'-03:00',fkst:'-03:00',gft:'-03:00',p:'-03:00',pmst:'-03:00',pyst:'-03:00',rott:'-03:00',srt:'-03:00',uyt:'-03:00',warst:'-03:00',wgt:'-03:00',ndt:'-02:30',brst:'-02:00',fnt:'-02:00',gst:'-02:00',o:'-02:00',pmdt:'-02:00',uyst:'-02:00',wgst:'-02:00',azot:'-01:00',cvt:'-01:00',egt:'-01:00',n:'-01:00',
    utc:'+00:00',azost:'+00:00',egst:'+00:00',gmt:'+00:00',wet:'+00:00',wt:'+00:00',z:'+00:00',a:'+01:00',bst:'+01:00',cet:'+01:00',ist:'+01:00',wat:'+01:00',west:'+01:00',wst:'+01:00',b:'+02:00',cat:'+02:00',cest:'+02:00',eet:'+02:00',ist:'+02:00',sast:'+02:00',wast:'+02:00',ast:'+03:00',c:'+03:00',eat:'+03:00',eest:'+03:00',fet:'+03:00',idt:'+03:00',msk:'+03:00',syot:'+03:00',trt:'+03:00',irst:'+03:30',adt:'+04:00',amt:'+04:00',azt:'+04:00',d:'+04:00',get:'+04:00',gst:'+04:00',kuyt:'+04:00',msd:'+04:00',mut:'+04:00',ret:'+04:00',samt:'+04:00',sct:'+04:00',aft:'+04:30',irdt:'+04:30',amst:'+05:00',aqtt:'+05:00',azst:'+05:00',e:'+05:00',mawt:'+05:00',mvt:'+05:00',orat:'+05:00',pkt:'+05:00',tft:'+05:00',tjt:'+05:00',tmt:'+05:00',uzt:'+05:00',yekt:'+05:00',ist:'+05:30',npt:'+05:45',
    almt:'+06:00',bst:'+06:00',btt:'+06:00',f:'+06:00',iot:'+06:00',kgt:'+06:00',omst:'+06:00',qyzt:'+06:00',vost:'+06:00',yekst:'+06:00',cct:'+06:30',mmt:'+06:30',cxt:'+07:00',davt:'+07:00',g:'+07:00',hovt:'+07:00',ict:'+07:00',krat:'+07:00',novst:'+07:00',novt:'+07:00',omsst:'+07:00',wib:'+07:00',awst:'+08:00',bnt:'+08:00',cast:'+08:00',chot:'+08:00',cst:'+08:00',h:'+08:00',hkt:'+08:00',hovst:'+08:00',irkt:'+08:00',krast:'+08:00',myt:'+08:00',pht:'+08:00',sgt:'+08:00',ulat:'+08:00',wita:'+08:00',pyt:'+08:30',acwst:'+08:45',awdt:'+09:00',chost:'+09:00',i:'+09:00',irkst:'+09:00',jst:'+09:00',kst:'+09:00',pwt:'+09:00',tlt:'+09:00',ulast:'+09:00',wit:'+09:00',yakt:'+09:00',acst:'+09:30',aest:'+10:00',chut:'+10:00',chst:'+10:00',ddut:'+10:00',k:'+10:00',pgt:'+10:00',vlat:'+10:00',yakst:'+10:00',yapt:'+10:00',acdt:'+10:30',lhst:'+10:30',aedt:'+11:00',bst:'+11:00',kost:'+11:00',l:'+11:00',lhdt:'+11:00',magt:'+11:00',nct:'+11:00',nft:'+11:00',pont:'+11:00',sakt:'+11:00',sbt:'+11:00',sret:'+11:00',vlast:'+11:00',vut:'+11:00',anast:'+12:00',anat:'+12:00',fjt:'+12:00',gilt:'+12:00',m:'+12:00',magst:'+12:00',mht:'+12:00',nfdt:'+12:00',nrt:'+12:00',nzst:'+12:00',petst:'+12:00',pett:'+12:00',tvt:'+12:00',wakt:'+12:00',wft:'+12:00',chast:'+12:45',fjst:'+13:00',nzdt:'+13:00',phot:'+13:00',tkt:'+13:00',tot:'+13:00',wst:'+13:00',chadt:'+13:45',lint:'+14:00',tost:'+14:00'};
const monthRe = "(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)";
const dayRe = "(\\d\\d?)[stnrdh]{0,2}";
const dateRe = `(?:${monthRe}\\W+${dayRe}|${dayRe}\\W+${monthRe})\\W+(\\d{4})`;
const timeRe = "(\\d\\d?):(\\d\\d)(?:\\W*([ap])\\.?m)?(?:\\W*([a-z]{1,5}))?";
const regex = RegExp(`^\\W*${dateRe}(?:\\W+${timeRe})?\\W*$`, "i");

function toDateTimeStringFormat(s) {
    const match = s.toLowerCase().match(regex);
    if (!match) return;
    let [, m1, day, d2, m2, year, hour, minute, pm, zone] = match;
    const month = 1 + (monthRe.indexOf(m1 ?? m2) >> 2); // month name to number
    day ??= d2;
    zone = zones[zone] ?? ""; // timezone code to offset
    hour ??= "0";
    minute ??= "0";
    if (pm) hour = String((+hour % 12) + 12 * (pm == "p")); // to 24h range
    return `${year}-${month}-${day}T${hour}:${minute}${zone}`
           .replace(/(?<!\d)\d(?!\d)/g, "0$&"); // pad single digit numbers
}

const tests = [
    'Apr 21, 2023,06:51 pm EDT',
    'Apr 21, 2023 06:51pm EDT',
    'Apr 21, 2023 06:51 pm EDT',
    '7th Feb 2023',
    'Dec 6th, 2022 19:56 CET',
    'Jan 3rd, 2021; 0:04(IST)',
];

for (const test of tests) console.log(toDateTimeStringFormat(test));

当然,这是有限的,如果需要支持更多输入格式,则需要扩展。

评论

0赞 rolling_codes 4/24/2023
是的,我选择了自定义实现。不是首选,但它可以满足我的需要